Dataset Information

A thorough evaluation of the Language Environment Analysis (LENA) system.

ABSTRACT: In the previous decade, dozens of studies involving thousands of children across several research disciplines have made use of a combined daylong audio-recorder and automated algorithmic analysis called the LENA^Ⓡ system, which aims to assess children's language environment. While the system's prevalence in the language acquisition domain is steadily growing, there are only scattered validation efforts on only some of its key characteristics. Here, we assess the LENA^Ⓡ system's accuracy across all of its key measures: speaker classification, Child Vocalization Counts (CVC), Conversational Turn Counts (CTC), and Adult Word Counts (AWC). Our assessment is based on manual annotation of clips that have been randomly or periodically sampled out of daylong recordings, collected from (a) populations similar to the system's original training data (North American English-learning children aged 3-36 months), (b) children learning another dialect of English (UK), and (c) slightly older children growing up in a different linguistic and socio-cultural setting (Tsimane' learners in rural Bolivia). We find reasonably high accuracy in some measures (AWC, CVC), with more problematic levels of performance in others (CTC, precision of male adults and other children). Statistical analyses do not support the view that performance is worse for children who are dissimilar from the LENA^Ⓡ original training set. Whether LENA^Ⓡ results are accurate enough for a given research, educational, or clinical application depends largely on the specifics at hand. We therefore conclude with a set of recommendations to help researchers make this determination for their goals.

SUBMITTER: Cristia A

PROVIDER: S-EPMC7855224 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:PurposeTo characterize, by specific biomarkers and nucleic acid sequencing, the structural and genomic sperm characteristics of partial (PG) and complete globozoospermic (CG) men in order to identify the best reproductive treatment.MethodsWe assessed spermatozoa from 14 consenting men ultrastructurally, as well as for histone content, sperm chromatin integrity, and sperm aneuploidy. Additional genomic, transcriptomic, and proteomic evaluations were carried out to further characterize the CG cohort. The presence of oocyte-activating sperm cytosolic factor (OASCF) was measured by a phospholipase C zeta (PLCζ) immunofluorescence assay. Couples were treated in subsequent cycles either by conventional ICSI or by ICSI with assisted gamete treatment (AGT) using calcium ionophore (Ionomycin, 19657, Sigma-Aldrich, Saint Louis, MO, USA).ResultsUltrastructural assessment confirmed complete acrosome deficiency in all spermatozoa from CG men. Histone content, sperm chromatin integrity, and sperm aneuploidy did not differ significantly between the PG (n = 4) and CG (n = 10) cohorts. PLCζ assessment indicated a positive presence of OASCF in 4 PG couples, who underwent subsequent ICSI cycles that yielded a 36.1% (43/119) fertilization with a 50% (2/4) clinical pregnancy and delivery rate. PLCζ assessment failed to detect OASCF for 8 CG patients who underwent 9 subsequent ICSI cycles with AGT, yielding a remarkable improvement of fertilization (39/97; 40.2%) (P = 0.00001). Embryo implantation (6/21; 28.6%) and clinical pregnancies (5/7; 71.4%) were also enhanced, resulting in 4 deliveries. Gene mutations (DPY19L2, SPATA16, PICK1) were identified in spermatozoa from CG patients. Additionally, CG patients unable to sustain a term pregnancy had gene mutations involved in zygote development (NLRP5) and postnatal development (BSX). CG patients who successfully sustained a pregnancy had a mutation (PIWIL1) related to sperm phenotype. PLCZ1 was both mutated and underexpressed in these CG patients, regardless of reproductive outcome.ConclusionsSperm bioassays and genomic studies can be used to characterize this gamete's capacity to support embryonic development and to tailor treatments maximizing reproductive outcome.

Project description:Top-down proteomics studies intact proteoform mixtures and offers important advantages over more common bottom-up proteomics technologies, as it avoids the protein inference problem. However, achieving complete molecular characterization of investigated proteoforms using existing technologies remains a fundamental challenge for top-down proteomics. Here, we benchmark the performance of ultraviolet photodissociation (UVPD) using 213 nm photons generated by a solid-state laser applied to the study of intact proteoforms from three organisms. Notably, the described UVPD setup applies multiple laser pulses to induce ion dissociation, and this feature can be used to optimize the fragmentation outcome based on the molecular weight of the analyzed biomolecule. When applied to complex proteoform mixtures in high-throughput top-down proteomics, 213 nm UVPD demonstrated a high degree of complementarity with the most employed fragmentation method in proteomics studies, higher-energy collisional dissociation (HCD). UVPD at 213 nm offered higher average proteoform sequence coverage and degree of proteoform characterization (including localization of post-translational modifications) than HCD. However, previous studies have shown limitations in applying database search strategies developed for HCD fragmentation to UVPD spectra which contains up to nine fragment ion types. We therefore performed an analysis of the different UVPD product ion type frequencies. From these data, we developed an ad hoc fragment matching strategy and determined the influence of each possible ion type on search outcomes. By paring down the number of ion types considered in high-throughput UVPD searches from all types down to the four most abundant, we were ultimately able to achieve deeper proteome characterization with UVPD. Lastly, our detailed product ion analysis also revealed UVPD cleavage propensities and determined the presence of a product ion produced specifically by 213 nm photons. All together, these observations could be used to better elucidate UVPD dissociation mechanisms and improve the utility of the technique for proteomic applications.

Project description:With the advent of the Heliophysics/Geospace System Observatory (H/GSO), a complement of multi-spacecraft missions and ground-based observatories to study the space environment, data retrieval, analysis, and visualization of space physics data can be daunting. The Space Physics Environment Data Analysis System (SPEDAS), a grass-roots software development platform (www.spedas.org), is now officially supported by NASA Heliophysics as part of its data environment infrastructure. It serves more than a dozen space missions and ground observatories and can integrate the full complement of past and upcoming space physics missions with minimal resources, following clear, simple, and well-proven guidelines. Free, modular and configurable to the needs of individual missions, it works in both command-line (ideal for experienced users) and Graphical User Interface (GUI) mode (reducing the learning curve for first-time users). Both options have "crib-sheets," user-command sequences in ASCII format that can facilitate record-and-repeat actions, especially for complex operations and plotting. Crib-sheets enhance scientific interactions, as users can move rapidly and accurately from exchanges of technical information on data processing to efficient discussions regarding data interpretation and science. SPEDAS can readily query and ingest all International Solar Terrestrial Physics (ISTP)-compatible products from the Space Physics Data Facility (SPDF), enabling access to a vast collection of historic and current mission data. The planned incorporation of Heliophysics Application Programmer's Interface (HAPI) standards will facilitate data ingestion from distributed datasets that adhere to these standards. Although SPEDAS is currently Interactive Data Language (IDL)-based (and interfaces to Java-based tools such as Autoplot), efforts are under-way to expand it further to work with python (first as an interface tool and potentially even receiving an under-the-hood replacement). We review the SPEDAS development history, goals, and current implementation. We explain its "modes of use" with examples geared for users and outline its technical implementation and requirements with software developers in mind. We also describe SPEDAS personnel and software management, interfaces with other organizations, resources and support structure available to the community, and future development plans. Electronic Supplementary Material:The online version of this article (10.1007/s11214-018-0576-4) contains supplementary material, which is available to authorized users.

Dataset Information

A thorough evaluation of the Language Environment Analysis (LENA) system.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets