Dataset Information

Examining influential factors for acknowledgements classification using supervised learning.

ABSTRACT: Acknowledgements have been examined as important elements in measuring the contributions to and intellectual debts of a scientific publication. Unlike previous studies that were limited in the scope of analysis and manual examination. The present study aimed to conduct the automatic classification of acknowledgements on a large scale of data. To this end, we first created a training dataset for acknowledgements classification by sampling the acknowledgements sections from the entire PubMed Central database. Second, we adopted various supervised learning algorithms to examine which algorithm performed best in what condition. In addition, we observed the factors affecting classification performance. We investigated the effects of the following three main aspects: classification algorithms, categories, and text representations. The CNN+Doc2Vec algorithm achieved the highest performance of 93.58% accuracy in the original dataset and 87.93% in the converted dataset. The experimental results indicated that the characteristics of categories and sentence patterns influenced the performance of classification. Most of the classifiers performed better on the categories of financial, peer interactive communication, and technical support compared to other classes.

SUBMITTER: Song M

PROVIDER: S-EPMC7021295 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Examining influential factors for acknowledgements classification using supervised learning.

Song Min M Kang Keun Young KY Timakum Tatsawan T Zhang Xinyuan X

PloS one 20200214 2

Acknowledgements have been examined as important elements in measuring the contributions to and intellectual debts of a scientific publication. Unlike previous studies that were limited in the scope of analysis and manual examination. The present study aimed to conduct the automatic classification of acknowledgements on a large scale of data. To this end, we first created a training dataset for acknowledgements classification by sampling the acknowledgements sections from the entire PubMed Centr ...[more]

PMID: 32059035

Similar Datasets

Project description:BackgroundDuring atherosclerosis, the narrowing of the arterial lumen is observed through the accumulation of bio compounds and the formation of plaque within artery walls. A non-linear optical imaging modality (NLOM), coherent anti-stokes Raman scattering (CARS) microscopy, can be used to image lipid-rich structures commonly found in atherosclerotic plaques. By matching the lipid's molecular vibrational frequencies (CH bonds), it is possible to map the accumulation of lipid-rich structures without the need for exogenous labelling and/or processing of the samples. CARS allows for the visualization of the morphological features of plaque. In combination with supervised machine learning, CARS imaged morphological features can be used to characterize the progression of atherosclerotic plaques. RESULTS: Based on a set of label-free CARS images of atherosclerotic plaques (i.e. foam cell clusters) from a Watanabe heritable hyperlipidemic rabbit model, we developed an automated pipeline to classify atherosclerotic lesions based on their major morphological features. Our method uses image preprocessing to first improve the quality of the CARS-imaged plaque, followed by the segmentation of the plaque using Otsu thresholding, marker-controlled watershed, K-means segmentation and a novel independent foam cell thresholding segmentation. To define relevant morphological features, 27 quantitative features were extracted and further refined by a novel coefficient of variation feature refinement method in accordance with filter-type feature selection. Refined morphological features were supplied into three supervised machine learning algorithms; K-nearest neighbour, support vector machine and decision tree classifier. The classification pipeline showcased the ability to exploit relevant plaque morphological features to accurately classify 3 pre-defined stages of atherosclerosis: early fatty streak development (EFS) and advancing atheroma (AA) with a greater than 85% class accuracy CONCLUSIONS: Through the combination of CARS microscopy and computational methods, a powerful classification tool was developed to identify the progression of atherosclerotic plaque in an automated manner. Using a curated dataset, the classification pipeline demonstrated the ability to differentiate between EFS, EF and AA. Thus, presenting the opportunity to classify the onset of atherosclerosis at an earlier stage of development.

Dataset Information

Examining influential factors for acknowledgements classification using supervised learning.

Publications

Examining influential factors for acknowledgements classification using supervised learning.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets