Project description:As newborn screening programs transition from paper-based data exchange toward automated, electronic methods, significant data exchange challenges must be overcome. This article outlines a data model that maps newborn screening data elements associated with patient demographic information, birthing facilities, laboratories, result reporting, and follow-up care to the LOINC, SNOMED CT, ICD-10-CM, and HL7 healthcare standards. The described framework lays the foundation for the implementation of standardized electronic data exchange across newborn screening programs, leading to greater data interoperability. The use of this model can accelerate the implementation of electronic data exchange between healthcare providers and newborn screening programs, which would ultimately improve health outcomes for all newborns and standardize data exchange across programs.
Project description:BackgroundThere are three main problems associated with the virtual screening of bioassay data. The first is access to freely-available curated data, the second is the number of false positives that occur in the physical primary screening process, and finally the data is highly-imbalanced with a low ratio of Active compounds to Inactive compounds. This paper first discusses these three problems and then a selection of Weka cost-sensitive classifiers (Naive Bayes, SVM, C4.5 and Random Forest) are applied to a variety of bioassay datasets.ResultsPharmaceutical bioassay data is not readily available to the academic community. The data held at PubChem is not curated and there is a lack of detailed cross-referencing between Primary and Confirmatory screening assays. With regard to the number of false positives that occur in the primary screening process, the analysis carried out has been shallow due to the lack of cross-referencing mentioned above. In six cases found, the average percentage of false positives from the High-Throughput Primary screen is quite high at 64%. For the cost-sensitive classification, Weka's implementations of the Support Vector Machine and C4.5 decision tree learner have performed relatively well. It was also found, that the setting of the Weka cost matrix is dependent on the base classifier used and not solely on the ratio of class imbalance.ConclusionsUnderstandably, pharmaceutical data is hard to obtain. However, it would be beneficial to both the pharmaceutical industry and to academics for curated primary screening and corresponding confirmatory data to be provided. Two benefits could be gained by employing virtual screening techniques to bioassay data. First, by reducing the search space of compounds to be screened and secondly, by analysing the false positives that occur in the primary screening process, the technology may be improved. The number of false positives arising from primary screening leads to the issue of whether this type of data should be used for virtual screening. Care when using Weka's cost-sensitive classifiers is needed - across the board misclassification costs based on class ratios should not be used when comparing differing classifiers for the same dataset.
Project description:Scope:Jurisdictional-based Early Hearing Detection and Intervention Information Systems (EHDI-IS) collect data on the hearing screening and follow-up status of infants across the United States. These systems serve as tools that assist EHDI programs' staff and partners in their tracking activities and provide a variety of data reports to help ensure that all children who are deaf/hard of hearing (DHH) are identified early and receive recommended intervention services. The quality and timeliness of the data collected with these systems are crucial to effectively meeting these goals. Methodology:Forty-eight EHDI programs, funded by the Centers for Disease Control and Prevention (CDC), successfully evaluated the accuracy, completeness, uniqueness, and timeliness of the hearing screening data as well as the acceptability (i.e., willingness to report) of the EHDI-IS among data reporters (2013-2016). This article describes the evaluations conducted and presents the findings from these evaluation activities. Conclusions:Most state EHDI programs are receiving newborn hearing screening results from hospitals and birthing facilities in a consistent way and data reporters are willing to report according to established protocols. However, additional efforts are needed to improve the accuracy and completeness of reported demographic data, results from infants transferred from other hospitals, and results from infants admitted to the Neonatal Intensive Care Unit.
Project description:In ultra-high dimensional data analysis, it is extremely challenging to identify important interaction effects, and a top concern in practice is computational feasibility. For a data set with n observations and p predictors, the augmented design matrix including all linear and order-2 terms is of size n × (p2 + 3p)/2. When p is large, say more than tens of hundreds, the number of interactions is enormous and beyond the capacity of standard machines and software tools for storage and analysis. In theory, the interaction selection consistency is hard to achieve in high dimensional settings. Interaction effects have heavier tails and more complex covariance structures than main effects in a random design, making theoretical analysis difficult. In this article, we propose to tackle these issues by forward-selection based procedures called iFOR, which identify interaction effects in a greedy forward fashion while maintaining the natural hierarchical model structure. Two algorithms, iFORT and iFORM, are studied. Computationally, the iFOR procedures are designed to be simple and fast to implement. No complex optimization tools are needed, since only OLS-type calculations are involved; the iFOR algorithms avoid storing and manipulating the whole augmented matrix, so the memory and CPU requirement is minimal; the computational complexity is linear in p for sparse models, hence feasible for p ≫ n. Theoretically, we prove that they possess sure screening property for ultra-high dimensional settings. Numerical examples are used to demonstrate their finite sample performance.
Project description:Colorectal neoplasia causes bleeding, enabling detection using Faecal Occult Blood tests (FOBt). The National Health Service (NHS) Bowel Cancer Screening Programme (BCSP) guaiac-based FOBt (gFOBt) kits contain six sample windows (or 'spots') and each kit returns either a positive, unclear or negative result. Test kits with five or six positive windows are termed 'abnormal' and the subject is referred for further investigation, usually colonoscopy. If 1-4 windows are positive, the result is initially 'unclear' and up to two further kits are submitted, further positivity leads to colonoscopy ('weak positive'). If no further blood is detected, the test is deemed 'normal' and subjects are tested again in 2 years' time. We studied the association between spot positivity % (SP%) and neoplasia.Subjects in the Southern Hub completing the first of two consecutive episodes between April 2009 and March 2011 were studied. Each episode included up to three kits and a maximum of 18 windows (spots). For each positivity combination, the percentage of positive spots out of the total number of spots completed by an individual in a single-screening episode was derived and named 'SP%'. Fifty-five combinations of SP can occur if the position of positive/negative spots on the same test card is ignored.The proportion of individuals for whom neoplasia was identified in Episode 2 was derived for each of the 55 spot combinations. In addition, the Episode 1 spot pattern was analysed for subjects with cancer detected in Episode 2.During Episode 2, 284,261 subjects completed gFOBT screening and colonoscopies were performed on 3891 (1.4%) subjects. At colonoscopy, cancer was detected in 7.4% (n=286) and a further 39.8% (n=1550) had adenomas. Cancer was detected in 21.3% of subjects with an abnormal first kit (five or six positive spots) and in 5.9% of those with a weak positive test result.The proportion of cancers detected was positively correlated with SP%, with an R(2) correlation (linear) of 0.89. As the SP% increased from 11 to 100%, so the colorectal cancer (CRC) detection rate increased from 4 to 25%. At the lower SP%s, from 11to 25%, the CRC risk was relatively static at ~4%. Above an SP% of 25%, every 10-percentage points increase in the SP%, was associated with an increase in cancer detection of 2.5%.This study demonstrated a strong correlation between SP% and cancer detection within the NHS BCSP. At the population level, subjects' cancer risk ranged from 4 to 25% and correlated with the gFOBt spot pattern.Some subjects with an SP% of 11% proceed to colonoscopy, whereas others with an SP% of 22% do not. Colonoscopy on patients with four positive spots in kit 1 (SP% 22%) would, we estimate, detect cancer in ~4% of cases and increase overall colonoscopy volume by 6%. This study also demonstrated how screening programme data could be used to guide its ongoing implementation and inform other programmes.
Project description:The National Toxicology Program is developing a high-throughput screening (HTS) program to set testing priorities for compounds of interest, to identify mechanisms of action, and potentially to develop predictive models for human toxicity. This program will generate extensive data on the activity of large numbers of chemicals in a wide variety of biochemical- and cell-based assays. The first step in relating patterns of response among batteries of HTS assays to in vivo toxicity is to distinguish between positive and negative compounds in individual assays. Here, the authors report on a statistical approach developed to identify compounds positive or negative in an HTS cytotoxicity assay based on data collected from screening 1353 compounds for concentration-response effects in 9 human and 4 rodent cell types. In this approach, the authors develop methods to normalize the data (removing bias due to the location of the compound on the 1536-well plates used in the assay) and to analyze for concentration-response relationships. Various statistical tests for identifying significant concentration-response relationships and for addressing reproducibility are developed and presented.
Project description:Health insurance is associated with increased utilization of cancer screening services. Data on breast, prostate and colorectal cancer screening were abstracted from the 2012 Behavioral Risk Factor and Surveillance System. This data in brief includes two sets of analyses: (i) the use of cancer screening in individuals within the low-income bracket and (ii) determinants for each of the three approaches to colorectal cancer screening (fecal occult blood test, colonoscopy and sigmoidoscopy+fecal occult blood test). Covariates included education attainment, residency, and access to health care provider. The data supplement our original research article on the effect of Medicare eligibility on cancer screening utilization "The impact of Medicare eligibility on cancer screening behaviors" [1].
Project description:Unbiased discovery approaches have the potential to uncover neurobiological insights into CNS disease and lead to the development of therapies. Here, we review lessons learned from imaging-based screening approaches and recent advances in these areas, including powerful new computational tools to synthesize complex data into more useful knowledge that can reliably guide future research and development.
Project description:Expression quantitative trait (eQTL) studies are a powerful tool for identifying genetic variants that affect levels of messenger RNA. Since gene expression is controlled by a complex network of gene-regulating factors, one way to identify these factors is to search for interaction effects between genetic variants and mRNA levels of transcription factors (TFs) and their respective target genes. However, identification of interaction effects in gene expression data pose a variety of methodological challenges, and it has become clear that such analyses should be conducted and interpreted with caution. Investigating the validity and interpretability of several interaction tests when screening for eQTL SNPs whose effect on the target gene expression is modified by the expression level of a transcription factor, we characterized two important methodological issues. First, we stress the scale-dependency of interaction effects and highlight that commonly applied transformation of gene expression data can induce or remove interactions, making interpretation of results more challenging. We then demonstrate that, in the setting of moderate to strong interaction effects on the order of what may be reasonably expected for eQTL studies, standard interaction screening can be biased due to heteroscedasticity induced by true interactions. Using simulation and real data analysis, we outline a set of reasonable minimum conditions and sample size requirements for reliable detection of variant-by-environment and variant-by-TF interactions using the heteroscedasticity consistent covariance-based approach.