Dataset Information

Automated Taxonomic Identification of Insects with Expert-Level Accuracy Using Effective Feature Transfer from Convolutional Networks.

ABSTRACT: Rapid and reliable identification of insects is important in many contexts, from the detection of disease vectors and invasive species to the sorting of material from biodiversity inventories. Because of the shortage of adequate expertise, there has long been an interest in developing automated systems for this task. Previous attempts have been based on laborious and complex handcrafted extraction of image features, but in recent years it has been shown that sophisticated convolutional neural networks (CNNs) can learn to extract relevant features automatically, without human intervention. Unfortunately, reaching expert-level accuracy in CNN identifications requires substantial computational power and huge training data sets, which are often not available for taxonomic tasks. This can be addressed using feature transfer: a CNN that has been pretrained on a generic image classification task is exposed to the taxonomic images of interest, and information about its perception of those images is used in training a simpler, dedicated identification system. Here, we develop an effective method of CNN feature transfer, which achieves expert-level accuracy in taxonomic identification of insects with training sets of 100 images or less per category, depending on the nature of data set. Specifically, we extract rich representations of intermediate to high-level image features from the CNN architecture VGG16 pretrained on the ImageNet data set. This information is submitted to a linear support vector machine classifier, which is trained on the target problem. We tested the performance of our approach on two types of challenging taxonomic tasks: 1) identifying insects to higher groups when they are likely to belong to subgroups that have not been seen previously and 2) identifying visually similar species that are difficult to separate even for experts. For the first task, our approach reached $CDATA[$CDATA[$>$$92% accuracy on one data set (884 face images of 11 families of Diptera, all specimens representing unique species), and $CDATA[$CDATA[$>$$96% accuracy on another (2936 dorsal habitus images of 14 families of Coleoptera, over 90% of specimens belonging to unique species). For the second task, our approach outperformed a leading taxonomic expert on one data set (339 images of three species of the Coleoptera genus Oxythyrea; 97% accuracy), and both humans and traditional automated identification systems on another data set (3845 images of nine species of Plecoptera larvae; 98.6 % accuracy). Reanalyzing several biological image identification tasks studied in the recent literature, we show that our approach is broadly applicable and provides significant improvements over previous methods, whether based on dedicated CNNs, CNN feature transfer, or more traditional techniques. Thus, our method, which is easy to apply, can be highly successful in developing automated taxonomic identification systems even when training data sets are small and computational budgets limited. We conclude by briefly discussing some promising CNN-based research directions in morphological systematics opened up by the success of these techniques in providing accurate diagnostic tools.

SUBMITTER: Valan M

PROVIDER: S-EPMC6802574 | biostudies-literature | 2019 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Automated Taxonomic Identification of Insects with Expert-Level Accuracy Using Effective Feature Transfer from Convolutional Networks.

Valan Miroslav M Makonyi Karoly K Maki Atsuto A Vondráček Dominik D Ronquist Fredrik F

Systematic biology 20191101 6

Rapid and reliable identification of insects is important in many contexts, from the detection of disease vectors and invasive species to the sorting of material from biodiversity inventories. Because of the shortage of adequate expertise, there has long been an interest in developing automated systems for this task. Previous attempts have been based on laborious and complex handcrafted extraction of image features, but in recent years it has been shown that sophisticated convolutional neural ne ...[more]

PMID: 30825372

Similar Datasets

Project description:Mechanical cues such as stresses and strains are now recognized as essential regulators in many biological processes like cell division, gene expression or morphogenesis. Studying the interplay between these mechanical cues and biological responses requires experimental tools to measure these cues. In the context of large scale tissues, this can be achieved by segmenting individual cells to extract their shapes and deformations which in turn inform on their mechanical environment. Historically, this has been done by segmentation methods which are well known to be time consuming and error prone. In this context however, one doesn't necessarily require a cell-level description and a coarse-grained approach can be more efficient while using tools different from segmentation. The advent of machine learning and deep neural networks has revolutionized the field of image analysis in recent years, including in biomedical research. With the democratization of these techniques, more and more researchers are trying to apply them to their own biological systems. In this paper, we tackle a problem of cell shape measurement thanks to a large annotated dataset. We develop simple Convolutional Neural Networks (CNNs) which we thoroughly optimize in terms of architecture and complexity to question construction rules usually applied. We find that increasing the complexity of the networks rapidly no longer yields improvements in performance and that the number of kernels in each convolutional layer is the most important parameter to achieve good results. In addition, we compare our step-by-step approach with transfer learning and find that our simple, optimized CNNs give better predictions, are faster in training and analysis and don't require more technical knowledge to be implemented. Overall, we offer a roadmap to develop optimized models and argue that we should limit the complexity of such models. We conclude by illustrating this strategy on a similar problem and dataset.

Project description:Cardiovascular resonance (CMR) imaging is a standard imaging modality for assessing cardiovascular diseases (CVDs), the leading cause of death globally. CMR enables accurate quantification of the cardiac chamber volume, ejection fraction and myocardial mass, providing information for diagnosis and monitoring of CVDs. However, for years, clinicians have been relying on manual approaches for CMR image analysis, which is time consuming and prone to subjective errors. It is a major clinical challenge to automatically derive quantitative and clinically relevant information from CMR images.Deep neural networks have shown a great potential in image pattern recognition and segmentation for a variety of tasks. Here we demonstrate an automated analysis method for CMR images, which is based on a fully convolutional network (FCN). The network is trained and evaluated on a large-scale dataset from the UK Biobank, consisting of 4,875 subjects with 93,500 pixelwise annotated images. The performance of the method has been evaluated using a number of technical metrics, including the Dice metric, mean contour distance and Hausdorff distance, as well as clinically relevant measures, including left ventricle (LV) end-diastolic volume (LVEDV) and end-systolic volume (LVESV), LV mass (LVM); right ventricle (RV) end-diastolic volume (RVEDV) and end-systolic volume (RVESV).By combining FCN with a large-scale annotated dataset, the proposed automated method achieves a high performance in segmenting the LV and RV on short-axis CMR images and the left atrium (LA) and right atrium (RA) on long-axis CMR images. On a short-axis image test set of 600 subjects, it achieves an average Dice metric of 0.94 for the LV cavity, 0.88 for the LV myocardium and 0.90 for the RV cavity. The mean absolute difference between automated measurement and manual measurement is 6.1 mL for LVEDV, 5.3 mL for LVESV, 6.9 gram for LVM, 8.5 mL for RVEDV and 7.2 mL for RVESV. On long-axis image test sets, the average Dice metric is 0.93 for the LA cavity (2-chamber view), 0.95 for the LA cavity (4-chamber view) and 0.96 for the RA cavity (4-chamber view). The performance is comparable to human inter-observer variability.We show that an automated method achieves a performance on par with human experts in analysing CMR images and deriving clinically relevant measures.

Project description:ImportanceConvolutional neural networks (CNNs) achieve expert-level accuracy in the diagnosis of pigmented melanocytic lesions. However, the most common types of skin cancer are nonpigmented and nonmelanocytic, and are more difficult to diagnose.ObjectiveTo compare the accuracy of a CNN-based classifier with that of physicians with different levels of experience.Design, setting, and participantsA CNN-based classification model was trained on 7895 dermoscopic and 5829 close-up images of lesions excised at a primary skin cancer clinic between January 1, 2008, and July 13, 2017, for a combined evaluation of both imaging methods. The combined CNN (cCNN) was tested on a set of 2072 unknown cases and compared with results from 95 human raters who were medical personnel, including 62 board-certified dermatologists, with different experience in dermoscopy.Main outcomes and measuresThe proportions of correct specific diagnoses and the accuracy to differentiate between benign and malignant lesions measured as an area under the receiver operating characteristic curve served as main outcome measures.ResultsAmong 95 human raters (51.6% female; mean age, 43.4 years; 95% CI, 41.0-45.7 years), the participants were divided into 3 groups (according to years of experience with dermoscopy): beginner raters (<3 years), intermediate raters (3-10 years), or expert raters (>10 years). The area under the receiver operating characteristic curve of the trained cCNN was higher than human ratings (0.742; 95% CI, 0.729-0.755 vs 0.695; 95% CI, 0.676-0.713; P < .001). The specificity was fixed at the mean level of human raters (51.3%), and therefore the sensitivity of the cCNN (80.5%; 95% CI, 79.0%-82.1%) was higher than that of human raters (77.6%; 95% CI, 74.7%-80.5%). The cCNN achieved a higher percentage of correct specific diagnoses compared with human raters (37.6%; 95% CI, 36.6%-38.4% vs 33.5%; 95% CI, 31.5%-35.6%; P = .001) but not compared with experts (37.3%; 95% CI, 35.7%-38.8% vs 40.0%; 95% CI, 37.0%-43.0%; P = .18).Conclusions and relevanceNeural networks are able to classify dermoscopic and close-up images of nonpigmented lesions as accurately as human experts in an experimental setting.

Project description:PurposeTo develop a fast and accurate convolutional neural network based method for segmentation of thalamic nuclei.MethodsA cascaded multi-planar scheme with a modified residual U-Net architecture was used to segment thalamic nuclei on conventional and white-matter-nulled (WMn) magnetization prepared rapid gradient echo (MPRAGE) data. A single network was optimized to work with images from healthy controls and patients with multiple sclerosis (MS) and essential tremor (ET), acquired at both 3 T and 7 T field strengths. WMn-MPRAGE images were manually delineated by a trained neuroradiologist using the Morel histological atlas as a guide to generate reference ground truth labels. Dice similarity coefficient and volume similarity index (VSI) were used to evaluate performance. Clinical utility was demonstrated by applying this method to study the effect of MS on thalamic nuclei atrophy.ResultsSegmentation of each thalamus into twelve nuclei was achieved in under a minute. For 7 T WMn-MPRAGE, the proposed method outperforms current state-of-the-art on patients with ET with statistically significant improvements in Dice for five nuclei (increase in the range of 0.05-0.18) and VSI for four nuclei (increase in the range of 0.05-0.19), while performing comparably for healthy and MS subjects. Dice and VSI achieved using 7 T WMn-MPRAGE data are comparable to those using 3 T WMn-MPRAGE data. For conventional MPRAGE, the proposed method shows a statistically significant Dice improvement in the range of 0.14-0.63 over FreeSurfer for all nuclei and disease types. Effect of noise on network performance shows robustness to images with SNR as low as half the baseline SNR. Atrophy of four thalamic nuclei and whole thalamus was observed for MS patients compared to healthy control subjects, after controlling for the effect of parallel imaging, intracranial volume, gender, and age (p < 0.004).ConclusionThe proposed segmentation method is fast, accurate, performs well across disease types and field strengths, and shows great potential for improving our understanding of thalamic nuclei involvement in neurological diseases.

Dataset Information

Automated Taxonomic Identification of Insects with Expert-Level Accuracy Using Effective Feature Transfer from Convolutional Networks.

Publications

Automated Taxonomic Identification of Insects with Expert-Level Accuracy Using Effective Feature Transfer from Convolutional Networks.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets