Project description:The main aim of this paper is to give an improvement of the recent result on the sharpness of the Jensen inequality. The results given here are obtained using different Green functions and considering the case of the real Stieltjes measure, not necessarily positive. Finally, some applications involving various types of f-divergences and Zipf–Mandelbrot law are presented.
Project description:MotivationThe evolution of complex diseases can be modeled as a time-dependent nonlinear dynamic system, and its progression can be divided into three states, i.e., the normal state, the pre-disease state and the disease state. The sudden deterioration of the disease can be regarded as the state transition of the dynamic system at the critical state or pre-disease state. How to detect the critical state of an individual before the disease state based on single-sample data has attracted many researchers' attention.MethodsIn this study, we proposed a novel approach, i.e., single-sample-based Jensen-Shannon Divergence (sJSD) method to detect the early-warning signals of complex diseases before critical transitions based on individual single-sample data. The method aims to construct score index based on sJSD, namely, inconsistency index (ICI).ResultsThis method is applied to five real datasets, including prostate cancer, bladder urothelial carcinoma, influenza virus infection, cervical squamous cell carcinoma and endocervical adenocarcinoma and pancreatic adenocarcinoma. The critical states of 5 datasets with their corresponding sJSD signal biomarkers are successfully identified to diagnose and predict each individual sample, and some "dark genes" that without differential expressions but are sensitive to ICI score were revealed. This method is a data-driven and model-free method, which can be applied to not only disease prediction on individuals but also targeted drug design of each disease. At the same time, the identification of sJSD signal biomarkers is also of great significance for studying the molecular mechanism of disease progression from a dynamic perspective.
Project description:Software optical mark recognition (SOMR) is the process whereby information entered on a survey form or questionnaire is converted using specialized software into a machine-readable format. SOMR normally requires input fields to be completely darkened, have no internal labels, or be filled with a soft pencil, otherwise mark detection will be inaccurate. Forms can also have print and scan artefacts that further increase the error rate. This article presents a new method of mark detection that improves over existing techniques based on pixel counting and simple thresholding. Its main advantage is that it can be used under a variety of conditions and yet maintain a high level of accuracy that is sufficient for scientific applications. Field testing shows no software misclassification in 5695 samples filled by trained personnel, and only two misclassifications in 6000 samples filled by untrained respondents. Sensitivity, specificity, and accuracy were 99.73%, 99.98%, and 99.94% respectively, even in the presence of print and scan artefacts, which was superior to other methods tested. A separate direct comparison for mark detection showed a sensitivity, specificity, and accuracy respectively of 99.7%, 100.0%, 100.0% (new method), 96.3%, 96.0%, 96.1% (pixel counting), and 99.9%, 99.8%, 99.8% (simple thresholding) on clean forms, and 100.0%, 99.1%, 99.3% (new method), 98.4%, 95.6%, 96.2% (pixel counting), 100.0%, 38.3%, 51.4% (simple thresholding) on forms with print artefacts. This method is designed for bubble and box fields, while other types such as handwriting fields require separate error control measures.
Project description:Motivated by the need to assess HIV vaccine efficacy, previous studies proposed an extension of the discrete competing risks proportional hazards model, in which the cause of failure is replaced by a continuous mark only observed at the failure time. However the model assumptions may fail in several ways, and no diagnostic testing procedure for this situation has been proposed. A goodness-of-fit test procedure for the stratified mark-specific proportional hazards model in which the regression parameters depend nonparametrically on the mark and the baseline hazards depends nonparametrically on both time and the mark is proposed. The test statistics are constructed based on the weighted cumulative mark-specific martingale residuals. The critical values of the proposed test statistics are approximated using the Gaussian multiplier method. The performance of the proposed tests are examined extensively in simulations for a variety of the models under the null hypothesis and under different types of alternative models. An analysis of the 'Step' HIV vaccine efficacy trial using the proposed method is presented. The analysis suggests that the HIV vaccine candidate may increase susceptibility to HIV acquisition.
Project description:Mutations in KDM5C gene are linked to X-linked mental retardation, the syndromic Claes-Jensen-type disease. This study focuses on non-synonymous mutations in the KDM5C ARID domain and evaluates the effects of two disease-associated missense mutations (A77T and D87G) and three not-yet-classified missense mutations (R108W, N142S, and R179H). We predict the ARID domain's folding and binding free energy changes due to mutations, and also study the effects of mutations on protein dynamics. Our computational results indicate that A77T and D87G mutants have minimal effect on the KDM5C ARID domain stability and DNA binding. In parallel, the change in the free energy unfolding caused by the mutants A77T and D87G were experimentally measured by urea-induced unfolding experiments and were shown to be similar to the in silico predictions. The evolutionary conservation analysis shows that the disease-associated mutations are located in a highly-conserved part of the ARID structure (N-terminal domain), indicating their importance for the KDM5C function. N-terminal residues' high conservation suggests that either the ARID domain utilizes the N-terminal to interact with other KDM5C domains or the N-terminal is involved in some yet unknown function. The analysis indicates that, among the non-classified mutations, R108W is possibly a disease-associated mutation, while N142S and R179H are probably harmless.
Project description:Taking advantage of the high-throughput genotyping technology of Single Nucleotide Polymorphism (SNP), Genome-Wide Association Studies (GWASs) have been successfully implemented for defining the relative role of genes and the environment in disease risk, assisting in enabling preventative and precision medicine. However, current multi-locus-based methods are insufficient in terms of computational cost and discrimination power to detect statistically significant interactions with different genetic effects on multifarious diseases. Statistical tests for multi-locus interactions (≥2 SNPs) raise huge analytical challenges because computational cost increases exponentially as the growth of the cardinality of SNPs in an interaction module. In this paper, we develop a simple, fast, and powerful method, named JS-MA, based on Jensen-Shannon divergence and agglomerative hierarchical clustering, to detect the genome-wide multi-locus interactions associated with multiple diseases. From the systematical simulation, JS-MA is more powerful and efficient compared with the state-of-the-art association mapping tools. JS-MA was applied to the real GWAS datasets for two common diseases, i.e., Rheumatoid Arthritis and Type 1 Diabetes. The results showed that JS-MA not only confirmed recently reported, biologically meaningful associations, but also identified novel multi-locus interactions. Therefore, we believe that JS-MA is suitable and efficient for a full-scale analysis of multi-disease-related interactions in the large GWASs.
Project description:Abundance estimation of carnivore populations is difficult and has prompted the use of non-invasive detection methods, such as remotely-triggered cameras, to collect data. To analyze photo data, studies focusing on carnivores with unique pelage patterns have utilized a mark-recapture framework and studies of carnivores without unique pelage patterns have used a mark-resight framework. We compared mark-resight and mark-recapture estimation methods to estimate bobcat (Lynx rufus) population sizes, which motivated the development of a new "hybrid" mark-resight model as an alternative to traditional methods. We deployed a sampling grid of 30 cameras throughout the urban southern California study area. Additionally, we physically captured and marked a subset of the bobcat population with GPS telemetry collars. Since we could identify individual bobcats with photos of unique pelage patterns and a subset of the population was physically marked, we were able to use traditional mark-recapture and mark-resight methods, as well as the new "hybrid" mark-resight model we developed to estimate bobcat abundance. We recorded 109 bobcat photos during 4,669 camera nights and physically marked 27 bobcats with GPS telemetry collars. Abundance estimates produced by the traditional mark-recapture, traditional mark-resight, and "hybrid" mark-resight methods were similar, however precision differed depending on the models used. Traditional mark-recapture and mark-resight estimates were relatively imprecise with percent confidence interval lengths exceeding 100% of point estimates. Hybrid mark-resight models produced better precision with percent confidence intervals not exceeding 57%. The increased precision of the hybrid mark-resight method stems from utilizing the complete encounter histories of physically marked individuals (including those never detected by a camera trap) and the encounter histories of naturally marked individuals detected at camera traps. This new estimator may be particularly useful for estimating abundance of uniquely identifiable species that are difficult to sample using camera traps alone.
Project description:The C-terminal domain (CTD) of the largest RNA polymerase II (RNAPII) subunit undergoes dynamic phosphorylation to support transcription-associated events and drive the transcription cycle. In mammalian cells, it comprises 52 repeats of the heptapeptide sequence Tyr(1)–Ser(2)–Pro(3)–Thr(4)–Ser(5)–Pro(6)–Ser(7). While important functions for Ser(2)-, Ser(5)-, and Ser(7)-phosphorylation have previously been described, a new report in The EMBO Journal now suggests an unexpectedly crucial role for Thr(4) phosphorylation as well.
Project description:Cardiovascular risk functions fail to identify more than 50% of patients who develop cardiovascular disease. This is especially evident in the intermediate-risk patients in which clinical management becomes difficult. Our purpose is to analyze if ankle-brachial index (ABI), measures of arterial stiffness, postprandial glucose, glycosylated hemoglobin, self-measured blood pressure and presence of comorbidity are independently associated to incidence of vascular events and whether they can improve the predictive capacity of current risk equations in the intermediate-risk population.This project involves 3 groups belonging to REDIAPP (RETICS RD06/0018) from 3 Spanish regions. We will recruit a multicenter cohort of 2688 patients at intermediate risk (coronary risk between 5 and 15% or vascular death risk between 3-5% over 10 years) and no history of atherosclerotic disease, selected at random. We will record socio-demographic data, information on diet, physical activity, comorbidity and intermittent claudication. We will measure ABI, pulse wave velocity and cardio ankle vascular index at rest and after a light intensity exercise. Blood pressure and anthropometric data will be also recorded. We will also quantify lipids, glucose and glycosylated hemoglobin in a fasting blood sample and postprandial capillary glucose. Eighteen months after the recruitment, patients will be followed up to determine the incidence of vascular events (later follow-ups are planned at 5 and 10 years). We will analyze whether the new proposed risk factors contribute to improve the risk functions based on classic risk factors.Primary prevention of cardiovascular diseases is a priority in public health policy of developed and developing countries. The fundamental strategy consists in identifying people in a high risk situation in which preventive measures are effective and efficient. Improvement of these predictions in our country will have an immediate, clinical and welfare impact and a short term public health effect.Clinical Trials.gov Identifier: NCT01428934.