Dataset Information

UGM: a more stable procedure for large-scale multiple testing problems, new solutions to identify oncogene.

ABSTRACT: Variations of gene expression levels play an important role in tumors. There are numerous methods to identify differentially expressed genes in high-throughput sequencing. Several algorithms endeavor to identify distinctive genetic patterns susceptable to particular diseases. Although these processes have been proved successful, the probability that the number of non-differentially expressed genes measured by false discovery rate (FDR) has a large standard deviation, and the misidentification rate (type I error) grows rapidly when the number of genes to be detected become larger. In this study we developed a new method, Unit Gamma Measurement (UGM), accounting for multiple hypotheses test statistics distribution, which could reduce the dependency problem. Simulated expression profile data and breast cancer RNA-Seq data were utilized to testify the accuracy of UGM. The results show that the number of non-differentially expressed genes identified by the UGM is very close to the real-evidence data, and the UGM also has a smaller standard error, range, quartile range and RMS error. In addition, the UGM can be used to screen many breast cancer-associated genes, such as BRCA1, BRCA2, PTEN, BRIP1, etc., provides better accuracy, robustness and efficiency, the method of identification differentially expressed genes in high-throughput sequencing.

SUBMITTER: Liu C

PROVIDER: S-EPMC6927121 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

UGM: a more stable procedure for large-scale multiple testing problems, new solutions to identify oncogene.

Liu Chengyou C Zhou Leilei L Wang Yuhe Y Tian Shuchang S Zhu Junlin J Qin Hang H Ding Yong Y Jiang Hongbing H

Theoretical biology & medical modelling 20191223 1

Variations of gene expression levels play an important role in tumors. There are numerous methods to identify differentially expressed genes in high-throughput sequencing. Several algorithms endeavor to identify distinctive genetic patterns susceptable to particular diseases. Although these processes have been proved successful, the probability that the number of non-differentially expressed genes measured by false discovery rate (FDR) has a large standard deviation, and the misidentification ra ...[more]

PMID: 31865918

Similar Datasets

Project description:The published literature reveals several arguments concerning the strategic importance of information and communication technology (ICT) interventions for developing countries where the digital divide is a challenge. Large-scale ICT interventions can be an option for countries whose regions, both urban and rural, present a high number of digitally excluded people. Our goal was to monitor and identify problems in interventions aimed at certification for a large number of participants in different geographical regions. Our case study is the training at the Telecentros.BR, a program created in Brazil to install telecenters and certify individuals to use ICT resources. We propose an approach that applies social network analysis and mining techniques to data collected from Telecentros.BR dataset and from the socioeconomics and telecommunications infrastructure indicators of the participants' municipalities. We found that (i) the analysis of interactions in different time periods reflects the objectives of each phase of training, highlighting the increased density in the phase in which participants develop and disseminate their projects; (ii) analysis according to the roles of participants (i.e., tutors or community members) reveals that the interactions were influenced by the center (or region) to which the participant belongs (that is, a community contained mainly members of the same region and always with the presence of tutors, contradicting expectations of the training project, which aimed for intense collaboration of the participants, regardless of the geographic region); (iii) the social network of participants influences the success of the training: that is, given evidence that the degree of the community member is in the highest range, the probability of this individual concluding the training is 0.689; (iv) the North region presented the lowest probability of participant certification, whereas the Northeast, which served municipalities with similar characteristics, presented high probability of certification, associated with the highest degree in social networking platform.

Project description:The Coronavirus disease 2019 (COVID-19) pandemic caused by the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) has resulted in economic and social lockdowns in most countries all over the globe. Early identification of infected individuals is regarded as one of the most important prerequisites for fighting the pandemic and for returning to a 'New Normal'. Large-scale testing is therefore crucial, but is facing several challenges including shortage of sample collection tools and of molecular biological reagents, and the need for safe electronic communication of medical reports. We present the successful establishment of a holistic SARS-CoV-2 testing platform that covers proband registration, sample collection and shipment, sample testing, and report issuing. The RT-PCR-based virus detection, being central to the platform, was extensively validated: sensitivity and specificity were defined as 96.8% and 100%, respectively; intra-run and inter-run precision were <3%. A novel type of sample swab and an in-house-developed RNA extraction system were shown to perform as good as commercially available products. The resulting flexibility guarantees independence from the current bottlenecks in SARS-CoV-2 testing. Based on our technology, we offered testing at local, national, and global levels. In the present study, we report the results from approx. 18,000 SARS-CoV-2 tests in almost 10,000 individuals from a low-frequency SARS-CoV-2 pandemic area in a homogenous geographical region in north-eastern Germany for a period of 10 weeks (21 March to 31 May 2020). Among the probands, five SARS-CoV-2 positive cases were identified. Comparative analysis of corresponding virus genomes revealed a diverse origin from three of the five currently recognized SARS-CoV-2 phylogenetic clades. Our study exemplifies how preventive SARS-CoV-2 testing can be set up in a rapid and flexible manner. The application of our test has enabled a safe maintenance/resume of critical local infrastructure, e.g., nursing homes where more than 5000 elderlies and caretakers got tested. The strategy outlined by the present study may serve as a blueprint for the implementation of large-scale preventive SARS-CoV-2 testing elsewhere.

Dataset Information

UGM: a more stable procedure for large-scale multiple testing problems, new solutions to identify oncogene.

Publications

UGM: a more stable procedure for large-scale multiple testing problems, new solutions to identify oncogene.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets