Dataset Information

Understanding Deep Learning Performance through an Examination of Test Set Difficulty: A Psychometric Case Study.

ABSTRACT: Interpreting the performance of deep learning models beyond test set accuracy is challenging. Characteristics of individual data points are often not considered during evaluation, and each data point is treated equally. We examine the impact of a test set question's difficulty to determine if there is a relationship between difficulty and performance. We model difficulty using well-studied psychometric methods on human response patterns. Experiments on Natural Language Inference (NLI) and Sentiment Analysis (SA) show that the likelihood of answering a question correctly is impacted by the question's difficulty. As DNNs are trained with more data, easy examples are learned more quickly than hard examples.

SUBMITTER: Lalor JP

PROVIDER: S-EPMC7685075 | biostudies-literature | 2018 Oct-Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Understanding Deep Learning Performance through an Examination of Test Set Difficulty: A Psychometric Case Study.

Lalor John P JP Wu Hao H Munkhdalai Tsendsuren T Yu Hong H

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing 20181001

Interpreting the performance of deep learning models beyond test set accuracy is challenging. Characteristics of individual data points are often not considered during evaluation, and each data point is treated equally. We examine the impact of a test set question's difficulty to determine if there is a relationship between difficulty and performance. We model difficulty using well-studied psychometric methods on human response patterns. Experiments on Natural Language Inference (NLI) and Sentim ...[more]

PMID: 33241233

Similar Datasets

Project description:PurposeThis study aimed to investigate the impact of a deep learning (DL)-based denoising method on the image quality and lesion detectability of 18F-FDG positron emission tomography (PET) images.MethodsFifty-two oncological patients undergoing an 18F-FDG PET/CT imaging with an acquisition of 180 s per bed position were retrospectively included. The list-mode data were rebinned into four datasets: 100% (reference), 75%, 50%, and 33.3% of the total counts, and then reconstructed by OSEM algorithm and post-processed with the DL and Gaussian filter (GS). The image quality was assessed using a 5-point Likert scale, and FDG-avid lesions were counted to measure lesion detectability. Standardized uptake values (SUVs) in livers and lesions, liver signal-to-noise ratio (SNR) and target-to-background ratio (TBR) values were compared between the methods. Subgroup analyses compared TBRs after categorizing lesions based on parameters like lesion diameter, uptake or patient habitus.ResultsThe DL method showed superior performance regarding image noise and inferior performance regarding lesion contrast in the qualitative assessment. More than 96.8% of the lesions were successfully identified in DL images. Excellent agreements on SUV in livers and lesions were found. The DL method significantly improved the liver SNR for count reduction down to 33.3% (p < 0.001). Lesion TBR was not significantly different between DL and reference images of the 75% dataset; furthermore, there was no significant difference either for lesions of > 10 mm or lesions in BMIs of > 25. For the 50% dataset, there was no significant difference between DL and reference images for TBR of lesion with > 15 mm or higher uptake than liver.ConclusionsThe developed DL method improved both liver SNR and lesion TBR indicating better image quality and lesion conspicuousness compared to GS method. Compared with the reference, it showed non-inferior image quality with reduced counts by 25-50% under various conditions.

Dataset Information

Understanding Deep Learning Performance through an Examination of Test Set Difficulty: A Psychometric Case Study.

Publications

Understanding Deep Learning Performance through an Examination of Test Set Difficulty: A Psychometric Case Study.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets