Project description:The main aim of this paper is to give an improvement of the recent result on the sharpness of the Jensen inequality. The results given here are obtained using different Green functions and considering the case of the real Stieltjes measure, not necessarily positive. Finally, some applications involving various types of f-divergences and Zipf–Mandelbrot law are presented.
Project description:MotivationThe evolution of complex diseases can be modeled as a time-dependent nonlinear dynamic system, and its progression can be divided into three states, i.e., the normal state, the pre-disease state and the disease state. The sudden deterioration of the disease can be regarded as the state transition of the dynamic system at the critical state or pre-disease state. How to detect the critical state of an individual before the disease state based on single-sample data has attracted many researchers' attention.MethodsIn this study, we proposed a novel approach, i.e., single-sample-based Jensen-Shannon Divergence (sJSD) method to detect the early-warning signals of complex diseases before critical transitions based on individual single-sample data. The method aims to construct score index based on sJSD, namely, inconsistency index (ICI).ResultsThis method is applied to five real datasets, including prostate cancer, bladder urothelial carcinoma, influenza virus infection, cervical squamous cell carcinoma and endocervical adenocarcinoma and pancreatic adenocarcinoma. The critical states of 5 datasets with their corresponding sJSD signal biomarkers are successfully identified to diagnose and predict each individual sample, and some "dark genes" that without differential expressions but are sensitive to ICI score were revealed. This method is a data-driven and model-free method, which can be applied to not only disease prediction on individuals but also targeted drug design of each disease. At the same time, the identification of sJSD signal biomarkers is also of great significance for studying the molecular mechanism of disease progression from a dynamic perspective.
Project description:Software optical mark recognition (SOMR) is the process whereby information entered on a survey form or questionnaire is converted using specialized software into a machine-readable format. SOMR normally requires input fields to be completely darkened, have no internal labels, or be filled with a soft pencil, otherwise mark detection will be inaccurate. Forms can also have print and scan artefacts that further increase the error rate. This article presents a new method of mark detection that improves over existing techniques based on pixel counting and simple thresholding. Its main advantage is that it can be used under a variety of conditions and yet maintain a high level of accuracy that is sufficient for scientific applications. Field testing shows no software misclassification in 5695 samples filled by trained personnel, and only two misclassifications in 6000 samples filled by untrained respondents. Sensitivity, specificity, and accuracy were 99.73%, 99.98%, and 99.94% respectively, even in the presence of print and scan artefacts, which was superior to other methods tested. A separate direct comparison for mark detection showed a sensitivity, specificity, and accuracy respectively of 99.7%, 100.0%, 100.0% (new method), 96.3%, 96.0%, 96.1% (pixel counting), and 99.9%, 99.8%, 99.8% (simple thresholding) on clean forms, and 100.0%, 99.1%, 99.3% (new method), 98.4%, 95.6%, 96.2% (pixel counting), 100.0%, 38.3%, 51.4% (simple thresholding) on forms with print artefacts. This method is designed for bubble and box fields, while other types such as handwriting fields require separate error control measures.
Project description:Motivated by the need to assess HIV vaccine efficacy, previous studies proposed an extension of the discrete competing risks proportional hazards model, in which the cause of failure is replaced by a continuous mark only observed at the failure time. However the model assumptions may fail in several ways, and no diagnostic testing procedure for this situation has been proposed. A goodness-of-fit test procedure for the stratified mark-specific proportional hazards model in which the regression parameters depend nonparametrically on the mark and the baseline hazards depends nonparametrically on both time and the mark is proposed. The test statistics are constructed based on the weighted cumulative mark-specific martingale residuals. The critical values of the proposed test statistics are approximated using the Gaussian multiplier method. The performance of the proposed tests are examined extensively in simulations for a variety of the models under the null hypothesis and under different types of alternative models. An analysis of the 'Step' HIV vaccine efficacy trial using the proposed method is presented. The analysis suggests that the HIV vaccine candidate may increase susceptibility to HIV acquisition.
Project description:The C-terminal domain (CTD) of the largest RNA polymerase II (RNAPII) subunit undergoes dynamic phosphorylation to support transcription-associated events and drive the transcription cycle. In mammalian cells, it comprises 52 repeats of the heptapeptide sequence Tyr(1)–Ser(2)–Pro(3)–Thr(4)–Ser(5)–Pro(6)–Ser(7). While important functions for Ser(2)-, Ser(5)-, and Ser(7)-phosphorylation have previously been described, a new report in The EMBO Journal now suggests an unexpectedly crucial role for Thr(4) phosphorylation as well.
Project description:Mutations in KDM5C gene are linked to X-linked mental retardation, the syndromic Claes-Jensen-type disease. This study focuses on non-synonymous mutations in the KDM5C ARID domain and evaluates the effects of two disease-associated missense mutations (A77T and D87G) and three not-yet-classified missense mutations (R108W, N142S, and R179H). We predict the ARID domain's folding and binding free energy changes due to mutations, and also study the effects of mutations on protein dynamics. Our computational results indicate that A77T and D87G mutants have minimal effect on the KDM5C ARID domain stability and DNA binding. In parallel, the change in the free energy unfolding caused by the mutants A77T and D87G were experimentally measured by urea-induced unfolding experiments and were shown to be similar to the in silico predictions. The evolutionary conservation analysis shows that the disease-associated mutations are located in a highly-conserved part of the ARID structure (N-terminal domain), indicating their importance for the KDM5C function. N-terminal residues' high conservation suggests that either the ARID domain utilizes the N-terminal to interact with other KDM5C domains or the N-terminal is involved in some yet unknown function. The analysis indicates that, among the non-classified mutations, R108W is possibly a disease-associated mutation, while N142S and R179H are probably harmless.
Project description:Taking advantage of the high-throughput genotyping technology of Single Nucleotide Polymorphism (SNP), Genome-Wide Association Studies (GWASs) have been successfully implemented for defining the relative role of genes and the environment in disease risk, assisting in enabling preventative and precision medicine. However, current multi-locus-based methods are insufficient in terms of computational cost and discrimination power to detect statistically significant interactions with different genetic effects on multifarious diseases. Statistical tests for multi-locus interactions (≥2 SNPs) raise huge analytical challenges because computational cost increases exponentially as the growth of the cardinality of SNPs in an interaction module. In this paper, we develop a simple, fast, and powerful method, named JS-MA, based on Jensen-Shannon divergence and agglomerative hierarchical clustering, to detect the genome-wide multi-locus interactions associated with multiple diseases. From the systematical simulation, JS-MA is more powerful and efficient compared with the state-of-the-art association mapping tools. JS-MA was applied to the real GWAS datasets for two common diseases, i.e., Rheumatoid Arthritis and Type 1 Diabetes. The results showed that JS-MA not only confirmed recently reported, biologically meaningful associations, but also identified novel multi-locus interactions. Therefore, we believe that JS-MA is suitable and efficient for a full-scale analysis of multi-disease-related interactions in the large GWASs.