Dataset Information

Evaluating human versus machine learning performance in classifying research abstracts.

ABSTRACT: We study whether humans or machine learning (ML) classification models are better at classifying scientific research abstracts according to a fixed set of discipline groups. We recruit both undergraduate and postgraduate assistants for this task in separate stages, and compare their performance against the support vectors machine ML algorithm at classifying European Research Council Starting Grant project abstracts to their actual evaluation panels, which are organised by discipline groups. On average, ML is more accurate than human classifiers, across a variety of training and test datasets, and across evaluation panels. ML classifiers trained on different training sets are also more reliable than human classifiers, meaning that different ML classifiers are more consistent in assigning the same classifications to any given abstract, compared to different human classifiers. While the top five percentile of human classifiers can outperform ML in limited cases, selection and training of such classifiers is likely costly and difficult compared to training ML models. Our results suggest ML models are a cost effective and highly accurate method for addressing problems in comparative bibliometric analysis, such as harmonising the discipline classifications of research from different funding agencies or countries.

SUBMITTER: Goh YC

PROVIDER: S-EPMC7367789 | biostudies-literature | 2020 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Evaluating human versus machine learning performance in classifying research abstracts.

Goh Yeow Chong YC Cai Xin Qing XQ Theseira Walter W Ko Giovanni G Khor Khiam Aik KA

Scientometrics 20200718 2

We study whether humans or machine learning (ML) classification models are better at classifying scientific research abstracts according to a fixed set of discipline groups. We recruit both undergraduate and postgraduate assistants for this task in separate stages, and compare their performance against the support vectors machine ML algorithm at classifying European Research Council Starting Grant project abstracts to their actual evaluation panels, which are organised by discipline groups. On a ...[more]

PMID: 32836529

Similar Datasets

Project description:Background and objectiveSmoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states.MethodsTo test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision.ResultsThe outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications.ConclusionsIn conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions.

Project description:BackgroundSignaling proteins such as protein kinases adopt a diverse array of conformations to respond to regulatory signals in signaling pathways. Perhaps the most fundamental conformational change of a kinase is the transition between active and inactive states, and defining the conformational features associated with kinase activation is critical for selectively targeting abnormally regulated kinases in diseases. While manual examination of crystal structures have led to the identification of key structural features associated with kinase activation, the large number of kinase crystal structures (~3,500) and extensive conformational diversity displayed by the protein kinase superfamily poses unique challenges in fully defining the conformational features associated with kinase activation. Although some computational approaches have been proposed, they are typically based on a small subset of crystal structures using measurements biased towards the active site geometry.ResultsWe utilize an unbiased informatics based machine learning approach to classify all eukaryotic protein kinase conformations deposited in the PDB. We show that the orientation of the activation segment, measured by φ, ψ, χ1, and pseudo-dihedral angles more accurately classify kinase crystal conformations than existing methods. We show that the formation of the K-E salt bridge is statistically dependent upon the activation segment orientation and identify evolutionary differences between the activation segment conformation of tyrosine and serine/threonine kinases. We provide evidence that our method can identify conformational changes associated with the binding of allosteric regulatory proteins, and show that the greatest variation in inactive structures comes from kinase group and family specific side chain orientations.ConclusionWe have provided the first comprehensive machine learning based classification of protein kinase active/inactive conformations, taking into account more structures and measurements than any previous classification effort. Further, our unbiased classification of inactive structures reveals residues associated with kinase functional specificity. To enable classification of new crystal structures, we have made our classifier publicly accessible through a stand-alone program housed at https://github.com/esbg/kinconform [DOI: 10.5281/zenodo.249090 ].

Project description:The evaluation of grouting effects constitutes a critical aspect of grouting engineering. With the maturity of the grouting project, the workload and empirical characteristics of grouting effect evaluation are gradually revealed. In the context of the Qiuji coal mine's directional drilling and grouting to limestone aquifer reformation, this study thoroughly analyzes the influencing factors of grouting effects from geological and engineering perspectives, comparing these with various engineering indices associated with drilling and grouting. This led to the establishment of a "dual-process, multi-parameter, and multi-factor" system, employing correlation analysis to validate the selected indices' reasonableness and scientific merit. Utilizing the chosen indices, eight high-performing machine learning models and three parameter optimization algorithms were employed to develop a model for assessing the effectiveness of directional grouting in limestone aquifers. The model's efficacy was evaluated based on accuracy, recall, precision, and F-score metrics, followed by practical engineering validation. Results indicate that the "dual-process, multi-parameter, multi-factor" system elucidates the relationship between influencing factors and engineering parameters, demonstrating the intricacy of evaluating grouting effects. Analysis revealed that the correlation among the eight selected indicators-including the proportion of boreholes in the target rock strata, drilling length, leakage, water level, pressure of grouting, mass of slurry injected, permeability properties of limestone aquifers before being grouted, permeability properties of limestone aquifers after being grouted-is not substantial, underscoring their viability as independent indicators for grouting effect evaluation. Comparative analysis showed that the Adaboost machine learning model, optimized via a genetic algorithm, demonstrated superior performance and more accurate evaluation results. Engineering validation confirmed that this model provides a more precise and realistic assessment of grouting effects compared to traditional methods.

Dataset Information

Evaluating human versus machine learning performance in classifying research abstracts.

Publications

Evaluating human versus machine learning performance in classifying research abstracts.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets