Dataset Information

Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study.

ABSTRACT:

Background

Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory, which assumes a fixed set of covariates in the model. This leads to over-optimistic selection and replicability issues.

Methods

We compared proposals for selective inference targeting the submodel parameters of the Lasso and its extension, the adaptive Lasso: sample splitting, selective inference conditional on the Lasso selection (SI), and universally valid post-selection inference (PoSI). We studied the properties of the proposed selective confidence intervals available via R software packages using a neutral simulation study inspired by real data commonly seen in biomedical studies. Furthermore, we present an exemplary application of these methods to a publicly available dataset to discuss their practical usability.

Results

Frequentist properties of selective confidence intervals by the SI method were generally acceptable, but the claimed selective coverage levels were not attained in all scenarios, in particular with the adaptive Lasso. The actual coverage of the extremely conservative PoSI method exceeded the nominal levels, and this method also required the greatest computational effort. Sample splitting achieved acceptable actual selective coverage levels, but the method is inefficient and leads to less accurate point estimates. The choice of inference method had a large impact on the resulting interval estimates, thereby necessitating that the user is acutely aware of the goal of inference in order to interpret and communicate the results.

Conclusions

Despite violating nominal coverage levels in some scenarios, selective inference conditional on the Lasso selection is our recommended approach for most cases. If simplicity is strongly favoured over efficiency, then sample splitting is an alternative. If only few predictors undergo variable selection (i.e. up to 5) or the avoidance of false positive claims of significance is a concern, then the conservative approach of PoSI may be useful. For the adaptive Lasso, SI should be avoided and only PoSI and sample splitting are recommended. In summary, we find selective inference useful to assess the uncertainties in the importance of individual selected predictors for future applications.

SUBMITTER: Kammer M

PROVIDER: S-EPMC9316707 | biostudies-literature | 2022 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study.

Kammer Michael M Dunkler Daniela D Michiels Stefan S Heinze Georg G

BMC medical research methodology 20220726 1

<h4>Background</h4>Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory, which assumes a fixed set of covariates in the model. This leads to over-optimistic selection and replicability issues.<h4>Methods</h4>We compared proposals for selective inference targeting the submodel parameters of the Lasso and its extension, the adaptive Lasso: sample splitting, selecti ...[more]

PMID: 35883041

Similar Datasets

Project description:BackgroundWorking together efficiently and effectively presents a significant challenge in large-scale, complex, interdisciplinary research projects. Collaboratories are a nascent method to help meet this challenge. However, formal collaboratories in biomedical research centers are the exception rather than the rule.ObjectiveThe main purpose of this paper is to compare and describe two collaboratories that used off-the-shelf tools and relatively modest resources to support the scientific activity of two biomedical research centers. The two centers were the Great Lakes Regional Center for AIDS Research (HIV/AIDS Center) and the New York University Oral Cancer Research for Adolescent and Adult Health Promotion Center (Oral Cancer Center).MethodsIn each collaboratory, we used semistructured interviews, surveys, and contextual inquiry to assess user needs and define the technology requirements. We evaluated and selected commercial software applications by comparing their feature sets with requirements and then pilot-testing the applications. Local and remote support staff cooperated in the implementation and end user training for the collaborative tools. Collaboratory staff evaluated each implementation by analyzing utilization data, administering user surveys, and functioning as participant observers.ResultsThe HIV/AIDS Center primarily required real-time interaction for developing projects and attracting new participants to the center; the Oral Cancer Center, on the other hand, mainly needed tools to support distributed and asynchronous work in small research groups. The HIV/AIDS Center's collaboratory included a center-wide website that also served as the launch point for collaboratory applications, such as NetMeeting, Timbuktu Conference, PlaceWare Auditorium, and iVisit. The collaboratory of the Oral Cancer Center used Groove and Genesys Web conferencing. The HIV/AIDS Center was successful in attracting new scientists to HIV/AIDS research, and members used the collaboratory for developing and implementing new research studies. The Oral Cancer Center successfully supported highly distributed and asynchronous research, and the collaboratory facilitated real-time interaction for analyzing data and preparing publications.ConclusionsThe two collaboratory implementations demonstrated the feasibility of supporting biomedical research centers using off-the-shelf commercial tools, but they also identified several barriers to successful collaboration. These barriers included computing platform incompatibilities, network infrastructure complexity, variable availability of local versus remote IT support, low computer and collaborative software literacy, and insufficient maturity of available collaborative software. Factors enabling collaboratory use included collaboration incentives through funding mechanism, a collaborative versus competitive relationship of researchers, leadership by example, and tools well matched to tasks and technical progress. Integrating electronic collaborative tools into routine scientific practice can be successful but requires further research on the technical, social, and behavioral factors influencing the adoption and use of collaboratories.

Project description:BackgroundStructured, systematic methods to formulate consensus recommendations, such as the Delphi process or nominal group technique, among others, provide the opportunity to harness the knowledge of experts to support clinical decision making in areas of uncertainty. They are widely used in biomedical research, in particular where disease characteristics or resource limitations mean that high-quality evidence generation is difficult. However, poor reporting of methods used to reach a consensus - for example, not clearly explaining the definition of consensus, or not stating how consensus group panellists were selected - can potentially undermine confidence in this type of research and hinder reproducibility. Our objective is therefore to systematically develop a reporting guideline to help the biomedical research and clinical practice community describe the methods or techniques used to reach consensus in a complete, transparent, and consistent manner.MethodsThe ACCORD (ACcurate COnsensus Reporting Document) project will take place in five stages and follow the EQUATOR Network guidance for the development of reporting guidelines. In Stage 1, a multidisciplinary Steering Committee has been established to lead and coordinate the guideline development process. In Stage 2, a systematic literature review will identify evidence on the quality of the reporting of consensus methodology, to obtain potential items for a reporting checklist. In Stage 3, Delphi methodology will be used to reach consensus regarding the checklist items, first among the Steering Committee, and then among a broader Delphi panel comprising participants with a range of expertise, including patient representatives. In Stage 4, the reporting guideline will be finalised in a consensus meeting, along with the production of an Explanation and Elaboration (E&E) document. In Stage 5, we plan to publish the reporting guideline and E&E document in open-access journals, supported by presentations at appropriate events. Dissemination of the reporting guideline, including a website linked to social media channels, is crucial for the document to be implemented in practice.DiscussionThe ACCORD reporting guideline will provide a set of minimum items that should be reported about methods used to achieve consensus, including approaches ranging from simple unstructured opinion gatherings to highly structured processes.

Dataset Information

Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study.

Background

Methods

Results

Conclusions

Publications

Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets