Project description:We present two new fisheye image datasets for training object and face detection models: VOC-360 and Wider-360. The fisheye images are created by post-processing regular images collected from two well-known datasets, VOC2012 and Wider Face, using a model for mapping regular to fisheye images implemented in Matlab. VOC-360 contains 39,575 fisheye images for object detection, segmentation, and classification. Wider-360 contains 63,897 fisheye images for face detection. These datasets will be useful for developing face and object detectors as well as segmentation modules for fisheye images while the efforts to collect and manually annotate true fisheye images are underway.
Project description:The ability to re-identify individuals is fundamental to the individual-based studies that are required to estimate many important ecological and evolutionary parameters in wild populations. Traditional methods of marking individuals and tracking them through time can be invasive and imperfect, which can affect these estimates and create uncertainties for population management. Here we present a photographic re-identification method that uses spot constellations in images to match specimens through time. Photographs of Arctic charr (Salvelinus alpinus) were used as a case study. Classical computer vision techniques were compared with new deep-learning techniques for masks and spot extraction. We found that a U-Net approach trained on a small set of human-annotated photographs performed substantially better than a baseline feature engineering approach. For matching the spot constellations, two algorithms were adapted, and, depending on whether a fully or semi-automated set-up is preferred, we show how either one or a combination of these algorithms can be implemented. Within our case study, our pipeline both successfully identified unmarked individuals from photographs alone and re-identified individuals that had lost tags, resulting in an approximately 4% increase in our estimate of survival rate. Overall, our multi-step pipeline involves little human supervision and could be applied to many organisms.
Project description:Genomic global positioning system (GPS) applies the multilateration technique commonly used in the GPS to genomic data. In the framework we present here, investigators calculate genetic distances from their samples to reference samples, which are from data held in the public domain, and share this information with others. This sharing enables certain types of genomic analysis, such as identifying sample overlaps and close relatives, decomposing ancestry, and mapping of geographical origin without disclosing personal genomes. Thus, our method can be seen as a balance between open data sharing and privacy protection.
Project description:Failure to identify difficult intubation is the leading cause of anesthesia-related death and morbidity. Despite preoperative airway assessment, 75-93% of difficult intubations are unanticipated, and airway examination methods underperform, with sensitivities of 20-62% and specificities of 82-97%. To overcome these impediments, we aim to develop a deep learning model to identify difficult to intubate patients using frontal face images. We proposed an ensemble of convolutional neural networks which leverages a database of celebrity facial images to learn robust features of multiple face regions. This ensemble extracts features from patient images (n = 152) which are subsequently classified by a respective ensemble of attention-based multiple instance learning models. Through majority voting, a patient is classified as difficult or easy to intubate. Whereas two conventional bedside tests resulted in AUCs of 0.6042 and 0.4661, the proposed method resulted in an AUC of 0.7105 using a cohort of 76 difficult and 76 easy to intubate patients. Generic features yielded AUCs of 0.4654-0.6278. The proposed model can operate at high sensitivity and low specificity (0.9079 and 0.4474) or low sensitivity and high specificity (0.3684 and 0.9605). The proposed ensembled model outperforms conventional bedside tests and generic features. Side facial images may improve the performance of the proposed model. The proposed method significantly surpasses conventional bedside tests and deep learning methods. We expect our model will play an important role in developing deep learning methods where frontal face features play an important role.
Project description:Ticks and tick-borne diseases represent a growing public health threat in North America and Europe. The number of ticks, their geographical distribution, and the incidence of tick-borne diseases, like Lyme disease, are all on the rise. Accurate, real-time tick-image identification through a smartphone app or similar platform could help mitigate this threat by informing users of the risks associated with encountered ticks and by providing researchers and public health agencies with additional data on tick activity and geographic range. Here we outline the requirements for such a system, present a model that meets those requirements, and discuss remaining challenges and frontiers in automated tick identification. We compiled a user-generated dataset of more than 12,000 images of the three most common tick species found on humans in the U.S.: Amblyomma americanum, Dermacentor variabilis, and Ixodes scapularis. We used image augmentation to further increase the size of our dataset to more than 90,000 images. Here we report the development and validation of a convolutional neural network which we call "TickIDNet," that scores an 87.8% identification accuracy across all three species, outperforming the accuracy of identifications done by a member of the general public or healthcare professionals. However, the model fails to match the performance of experts with formal entomological training. We find that image quality, particularly the size of the tick in the image (measured in pixels), plays a significant role in the network's ability to correctly identify an image: images where the tick is small are less likely to be correctly identified because of the small object detection problem in deep learning. TickIDNet's performance can be increased by using confidence thresholds to introduce an "unsure" class and building image submission pipelines that encourage better quality photos. Our findings suggest that deep learning represents a promising frontier for tick identification that should be further explored and deployed as part of the toolkit for addressing the public health consequences of tick-borne diseases.
Project description:In recent years, the need for security of personal data is becoming progressively important. In this regard, the identification system based on fusion of multibiometric is most recommended for significantly improving and achieving the high performance accuracy. The main purpose of this paper is to propose a hybrid system of combining the effect of tree efficient models: Convolutional neural network (CNN), Softmax and Random forest (RF) classifier based on multi-biometric fingerprint, finger-vein and face identification system. In conventional fingerprint system, image pre-processed is applied to separate the foreground and background region based on K-means and DBSCAN algorithm. Furthermore, the features are extracted using CNNs and dropout approach, after that, the Softmax performs as a recognizer. In conventional fingervein system, the region of interest image contrast enhancement using exposure fusion framework is input into the CNNs model. Moreover, the RF classifier is proposed for classification. In conventional face system, the CNNs architecture and Softmax are required to generate face feature vectors and classify personal recognition. The score provided by these systems is combined for improving Human identification. The proposed algorithm is evaluated on publicly available SDUMLA-HMT real multimodal biometric database using a GPU based implementation. Experimental results on the datasets has shown significant capability for identification biometric system. The proposed work can offer an accurate and efficient matching compared with other system based on unimodal, bimodal, multimodal characteristics.
Project description:While rich medical, behavioral, and socio-demographic data are key to modern data-driven research, their collection and use raise legitimate privacy concerns. Anonymizing datasets through de-identification and sampling before sharing them has been the main tool used to address those concerns. We here propose a generative copula-based method that can accurately estimate the likelihood of a specific person to be correctly re-identified, even in a heavily incomplete dataset. On 210 populations, our method obtains AUC scores for predicting individual uniqueness ranging from 0.84 to 0.97, with low false-discovery rate. Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model.
Project description:Although anonymous data are not considered personal data, recent research has shown how individuals can often be re-identified. Scholars have argued that previous findings apply only to small-scale datasets and that privacy is preserved in large-scale datasets. Using 3 months of location data, we (1) show the risk of re-identification to decrease slowly with dataset size, (2) approximate this decrease with a simple model taking into account three population-wide marginal distributions, and (3) prove that unicity is convex and obtain a linear lower bound. Our estimates show that 93% of people would be uniquely identified in a dataset of 60M people using four points of auxiliary information, with a lower bound at 22%. This lower bound increases to 87% when five points are available. Taken together, our results show how the privacy of individuals is very unlikely to be preserved even in country-scale location datasets.
Project description:Atherosclerosis is one of the most common type of cardiovascular disease and the prime cause of mortality in the aging population worldwide. However, the detail mechanisms and special biomarkers of atherosclerosis remain to be further investigated. Lately, long non-coding RNAs (lncRNAs) has attracted much more attention than other types of ncRNAs. In our work, we found and confirmed differently expressed lncRNAs and mRNAs in atherosclerosis by analyzing GSE28829. We performed the weighted gene co-expression network analysis (WGCNA) by analyzing GSE40231 to confirm highly correlated genes. Gene Ontology (GO) analysis were utilized to assess the potential functions of differential expressed lncRNAs in atherosclerosis. Co-expression networks were also constructed to confirm hub lncRNAs in atherosclerosis. A total of 5784 mRNAs and 654 lncRNAs were found to be dysregulated in the progression of atherosclerosis. A total of 15 lncRNA-mRNA co-expression modules were identified in this study based on WGCNA analysis. Moreover, a few lncRNAs, such as ZFAS1, LOC100506730, LOC100506691, DOCK9-AS2, RP11-6I2.3, LOC100130219, were confirmed as important lncRNAs in atherosclerosis. Taken together, bioinformatics analysis revealed these lncRNAs were involved in regulating the leukotriene biosynthetic process, gene expression, actin filament organization, t-circle formation, antigen processing, and presentation, interferon-gamma-mediated signaling pathway, and activation of GTPase activity. We believed that this study would provide potential novel therapeutic and prognostic targets for atherosclerosis.