Dataset Information

Reducing False-Positive Results in Newborn Screening Using Machine Learning.

ABSTRACT: Newborn screening (NBS) for inborn metabolic disorders is a highly successful public health program that by design is accompanied by false-positive results. Here we trained a Random Forest machine learning classifier on screening data to improve prediction of true and false positives. Data included 39 metabolic analytes detected by tandem mass spectrometry and clinical variables such as gestational age and birth weight. Analytical performance was evaluated for a cohort of 2777 screen positives reported by the California NBS program, which consisted of 235 confirmed cases and 2542 false positives for one of four disorders: glutaric acidemia type 1 (GA-1), methylmalonic acidemia (MMA), ornithine transcarbamylase deficiency (OTCD), and very long-chain acyl-CoA dehydrogenase deficiency (VLCADD). Without changing the sensitivity to detect these disorders in screening, Random Forest-based analysis of all metabolites reduced the number of false positives for GA-1 by 89%, for MMA by 45%, for OTCD by 98%, and for VLCADD by 2%. All primary disease markers and previously reported analytes such as methionine for MMA and OTCD were among the top-ranked analytes. Random Forest's ability to classify GA-1 false positives was found similar to results obtained using Clinical Laboratory Integrated Reports (CLIR). We developed an online Random Forest tool for interpretive analysis of increasingly complex data from newborn screening.

SUBMITTER: Peng G

PROVIDER: S-EPMC7080200 | biostudies-literature | 2020 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Reducing False-Positive Results in Newborn Screening Using Machine Learning.

Peng Gang G Tang Yishuo Y Cowan Tina M TM Enns Gregory M GM Zhao Hongyu H Scharfe Curt C

International journal of neonatal screening 20200303 1

Newborn screening (NBS) for inborn metabolic disorders is a highly successful public health program that by design is accompanied by false-positive results. Here we trained a Random Forest machine learning classifier on screening data to improve prediction of true and false positives. Data included 39 metabolic analytes detected by tandem mass spectrometry and clinical variables such as gestational age and birth weight. Analytical performance was evaluated for a cohort of 2777 screen positives r ...[more]

PMID: 32190768

Similar Datasets

Project description:BackgroundNext-generation sequencing pipelines often perform error correction as a preprocessing step to obtain cleaned input data. State-of-the-art error correction programs are able to reliably detect and correct the majority of sequencing errors. However, they also introduce new errors by making false-positive corrections. These correction mistakes can have negative impact on downstream analysis, such as k-mer statistics, de-novo assembly, and variant calling. This motivates the need for more precise error correction tools.ResultsWe present CARE 2.0, a context-aware read error correction tool based on multiple sequence alignment targeting Illumina datasets. In addition to a number of newly introduced optimizations its most significant change is the replacement of CARE 1.0's hand-crafted correction conditions with a novel classifier based on random decision forests trained on Illumina data. This results in up to two orders-of-magnitude fewer false-positive corrections compared to other state-of-the-art error correction software. At the same time, CARE 2.0 is able to achieve high numbers of true-positive corrections comparable to its competitors. On a simulated full human dataset with 914M reads CARE 2.0 generates only 1.2M false positives (FPs) (and 801.4M true positives (TPs)) at a highly competitive runtime while the best corrections achieved by other state-of-the-art tools contain at least 3.9M FPs and at most 814.5M TPs. Better de-novo assembly and improved k-mer analysis show the applicability of CARE 2.0 to real-world data.ConclusionFalse-positive corrections can negatively influence down-stream analysis. The precision of CARE 2.0 greatly reduces the number of those corrections compared to other state-of-the-art programs including BFC, Karect, Musket, Bcool, SGA, and Lighter. Thus, higher-quality datasets are produced which improve k-mer analysis and de-novo assembly in real-world datasets which demonstrates the applicability of machine learning techniques in the context of sequencing read error correction. CARE 2.0 is written in C++/CUDA for Linux systems and can be run on the CPU as well as on CUDA-enabled GPUs. It is available at https://github.com/fkallen/CARE .

Project description:IntroductionEarly cancer detection can significantly improve patient outcomes and reduce mortality rates. Novel cancer screening approaches, including multi-cancer early detection tests, have been developed. Cost-utility analyses will be needed to examine their value, and these models require health state utilities. The purpose of this study was to estimate the disutility (i.e., decrease in health state utility) associated with false-positive cancer screening results.MethodsIn composite time trade-off interviews using a 1-year time horizon, UK general population participants valued 10 health state vignettes describing cancer screening with true-negative or false-positive results. Each false-positive vignette described a common diagnostic pathway following a false-positive result suggesting lung, colorectal, breast, or pancreatic cancer. Every pathway ended with a negative result (no cancer detected). The disutility of each false positive was calculated as the difference between the true-negative and each false-positive health state, and because of the 1-year time horizon, each disutility can be interpreted as a quality-adjusted life-year decrement associated with each type of false-positive experience.ResultsA total of 203 participants completed interviews (49.8% male; mean age = 42.0 years). The mean (SD) utility for the health state describing a true-negative result was 0.958 (0.065). Utilities for false-positive health states ranged from 0.847 (0.145) to 0.932 (0.059). Disutilities for false positives ranged from - 0.031 to - 0.111 (- 0.041 to - 0.111 for lung cancer; - 0.079 for colorectal cancer; - 0.031 to - 0.067 for breast cancer; - 0.048 to - 0.088 for pancreatic cancer).ConclusionAll false-positive results were associated with a disutility. Greater disutility was associated with more invasive follow-up diagnostic procedures, longer duration of uncertainty regarding the eventual diagnosis, and perceived severity of the suspected cancer type. Utility values estimated in this study would be useful for economic modeling examining the value of cancer screening procedures.

Dataset Information

Reducing False-Positive Results in Newborn Screening Using Machine Learning.

Publications

Reducing False-Positive Results in Newborn Screening Using Machine Learning.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets