Dataset Information

Predicting Writing Styles of Web-Based Materials for Children's Health Education Using the Selection of Semantic Features: Machine Learning Approach.

ABSTRACT:

Background

Medical writing styles can have an impact on the understandability of health educational resources. Amid current web-based health information research, there is a dearth of research-based evidence that demonstrates what constitutes the best practice of the development of web-based health resources on children's health promotion and education.

Objective

Using authoritative and highly influential web-based children's health educational resources from the Nemours Foundation, the largest not-for-profit organization promoting children's health and well-being, we aimed to develop machine learning algorithms to discriminate and predict the writing styles of health educational resources on children versus adult health promotion using a variety of health educational resources aimed at the general public.

Methods

The selection of natural language features as predicator variables of algorithms went through initial automatic feature selection using ridge classifier, support vector machine, extreme gradient boost tree, and recursive feature elimination followed by revision by education experts. We compared algorithms using the automatically selected (n=19) and linguistically enhanced (n=20) feature sets, using the initial feature set (n=115) as the baseline.

Results

Using five-fold cross-validation, compared with the baseline (115 features), the Gaussian Naive Bayes model (20 features) achieved statistically higher mean sensitivity (P=.02; 95% CI -0.016 to 0.1929), mean specificity (P=.02; 95% CI -0.016 to 0.199), mean area under the receiver operating characteristic curve (P=.02; 95% CI -0.007 to 0.140), and mean macro F1 (P=.006; 95% CI 0.016-0.167). The statistically improved performance of the final model (20 features) is in contrast to the statistically insignificant changes between the original feature set (n=115) and the automatically selected features (n=19): mean sensitivity (P=.13; 95% CI -0.1699 to 0.0681), mean specificity (P=.10; 95% CI -0.1389 to 0.4017), mean area under the receiver operating characteristic curve (P=.008; 95% CI 0.0059-0.1126), and mean macro F1 (P=.98; 95% CI -0.0555 to 0.0548). This demonstrates the importance and effectiveness of combining automatic feature selection and expert-based linguistic revision to develop the most effective machine learning algorithms from high-dimensional data sets.

Conclusions

We developed new evaluation tools for the discrimination and prediction of writing styles of web-based health resources for children's health education and promotion among parents and caregivers of children. User-adaptive automatic assessment of web-based health content holds great promise for distant and remote health education among young readers. Our study leveraged the precision and adaptability of machine learning algorithms and insights from health linguistics to help advance this significant yet understudied area of research.

SUBMITTER: Xie W

PROVIDER: S-EPMC8367110 | biostudies-literature | 2021 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Predicting Writing Styles of Web-Based Materials for Children's Health Education Using the Selection of Semantic Features: Machine Learning Approach.

Xie Wenxiu W Ji Meng M Liu Yanmeng Y Hao Tianyong T Chow Chi-Yin CY

JMIR medical informatics 20210722 7

<h4>Background</h4>Medical writing styles can have an impact on the understandability of health educational resources. Amid current web-based health information research, there is a dearth of research-based evidence that demonstrates what constitutes the best practice of the development of web-based health resources on children's health promotion and education.<h4>Objective</h4>Using authoritative and highly influential web-based children's health educational resources from the Nemours Foundatio ...[more]

PMID: 34292167

Similar Datasets

Project description:The characterization of natural spaces by the precise observation of their material properties is highly demanded in remote sensing and computer vision. The production of novel sensors enables the collection of heterogeneous data to get a comprehensive knowledge of the living and non-living entities in the ecosystem. The high resolution of consumer-grade RGB cameras is frequently used for the geometric reconstruction of many types of environments. Nevertheless, the understanding of natural spaces is still challenging. The automatic segmentation of homogeneous materials in nature is a complex task because there are many overlapping structures and an indirect illumination, so the object recognition is difficult. In this paper, we propose a method based on fusing spatial and multispectral characteristics for the unsupervised classification of natural materials in a point cloud. A high-resolution camera and a multispectral sensor are mounted on a custom camera rig in order to simultaneously capture RGB and multispectral images. Our method is tested in a controlled scenario, where different natural objects coexist. Initially, the input RGB images are processed to generate a point cloud by applying the structure-from-motion (SfM) algorithm. Then, the multispectral images are mapped on the three-dimensional model to characterize the geometry with the reflectance captured from four narrow bands (green, red, red-edge and near-infrared). The reflectance, the visible colour and the spatial component are combined to extract key differences among all existing materials. For this purpose, a hierarchical cluster analysis is applied to pool the point cloud and identify the feature pattern for every material. As a result, the tree trunk, the leaves, different species of low plants, the ground and rocks can be clearly recognized in the scene. These results demonstrate the feasibility to perform a semantic segmentation by considering multispectral and spatial features with an unknown number of clusters to be detected on the point cloud. Moreover, our solution is compared to other method based on supervised learning in order to test the improvement of the proposed approach.

Project description:BackgroundThousands of web searches are performed related to transarterial chemoembolization (TACE), given its palliative role in the treatment of liver cancer.ObjectiveThis study aims to assess the reliability, quality, completeness, readability, understandability, and actionability of websites that provide information on TACE for patients.MethodsThe five most popular keywords pertaining to TACE were searched on Google, Yahoo, and Bing. General website characteristics and the presence of Health On the Net Foundation code certification were documented. Website assessment was performed using the following scores: DISCERN, Journal of the American Medical Association, Flesch-Kincaid Grade Level, Flesch Reading Ease Score, and the Patient Education Materials Assessment Tool. A novel TACE content score was generated to evaluate website completeness.ResultsThe search yielded 3750 websites. In total, 81 website entities belonging to 78 website domains met the inclusion criteria. A medical disclaimer was not provided on 28% (22/78) of website domains. Health On the Net code certification was present on 12% (9/78) of website domains. Authorship was absent on 88% (71/81) of websites, and sources were absent on 83% (67/81) of websites. The date of publication or of the last update was not listed on 58% (47/81) of websites. The median DISCERN score was 47.0 (IQR 40.5-54.0). The median TACE content score was 35 (IQR 27-43). The median readability grade level was in the 11th grade. Overall, 61% (49/81) and 16% (13/81) of websites were deemed understandable and actionable, respectively. Not-for-profit websites fared significantly better on the Journal of the American Medical Association, DISCERN, and TACE content scores.ConclusionsThe content referring to TACE that is currently available on the web is unreliable, incomplete, difficult to read, understandable but not actionable, and characterized by low overall quality. Websites need to revise their content to optimally educate consumers and support shared decision-making.Trial registrationPROSPERO International Prospective Register of Systematic Reviews CRD42020202747; https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42020202747.

Project description:A growing body of experimental evidence suggests that microRNAs (miRNAs) are closely associated with specific human diseases and play critical roles in their development and progression. Therefore, identifying miRNA related to specific diseases is of great significance for disease screening and treatment. In the early stages, the identification of associations between miRNAs and diseases demanded laborious and time-consuming biological experiments that often carried a substantial risk of failure. With the exponential growth in the number of potential miRNA-disease association combinations, traditional biological experimental methods face difficulties in processing massive amounts of data. Hence, developing more efficient computational methods to predict possible miRNA-disease associations and prioritize them is particularly necessary. In recent years, numerous deep learning-based computational methods have been developed and have demonstrated excellent performance. However, most of these methods rely on external databases or tools to compute various auxiliary information. Unfortunately, these external databases or tools often cover only a limited portion of miRNAs and diseases, resulting in many miRNAs and diseases being unable to match with these computational methods. Therefore, there are certain limitations associated with the practical application of these methods. To overcome the above limitations, this study proposes a multi-view computational model called MVNMDA, which predicts potential miRNA-disease associations by integrating features of miRNA and diseases from local views, global views, and semantic views. Specifically, MVNMDA utilizes known association information to construct node initial features. Then, multiple networks are constructed based on known association to extract low-dimensional feature embedding of all nodes. Finally, a cascaded attention classifier is proposed to fuse features from coarse to fine, suppressing noise within the features and making precise predictions. To validate the effectiveness of the proposed method, extensive experiments were conducted on the HMDD v2.0 and HMDD v3.2 datasets. The experimental results demonstrate that MVNMDA achieves better performance compared to other computational methods. Additionally, the case study results further demonstrate the reliable predictive performance of MVNMDA.

Dataset Information

Predicting Writing Styles of Web-Based Materials for Children's Health Education Using the Selection of Semantic Features: Machine Learning Approach.

Background

Objective

Methods

Results

Conclusions

Publications

Predicting Writing Styles of Web-Based Materials for Children's Health Education Using the Selection of Semantic Features: Machine Learning Approach.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets