Browse
Submit Data
Databases
API
Help

Dataset Information

11 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Vocal development in a large-scale crosslinguistic corpus.

ABSTRACT: This study evaluates whether early vocalizations develop in similar ways in children across diverse cultural contexts. We analyze data from daylong audio recordings of 49 children (1-36 months) from five different language/cultural backgrounds. Citizen scientists annotated these recordings to determine if child vocalizations contained canonical transitions or not (e.g., "ba" vs. "ee"). Results revealed that the proportion of clips reported to contain canonical transitions increased with age. Furthermore, this proportion exceeded 0.15 by around 7 months, replicating and extending previous findings on canonical vocalization development but using data from the natural environments of a culturally and linguistically diverse sample. This work explores how crowdsourcing can be used to annotate corpora, helping establish developmental milestones relevant to multiple languages and cultures. Lower inter-annotator reliability on the crowdsourcing platform, relative to more traditional in-lab expert annotators, means that a larger number of unique annotators and/or annotations are required, and that crowdsourcing may not be a suitable method for more fine-grained annotation decisions. Audio clips used for this project are compiled into a large-scale infant vocalization corpus that is available for other researchers to use in future work.

SUBMITTER: Cychosz M

PROVIDER: S-EPMC8310893 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Similar Datasets

DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding.

Project description:Recent advances in computer vision (CV) and natural language processing have been driven by exploiting big data on practical applications. However, these research fields are still limited by the sheer volume, versatility, and diversity of the available datasets. CV tasks, such as image captioning, which has primarily been carried out on natural images, still struggle to produce accurate and meaningful captions on sketched images often included in scientific and technical documents. The advancement of other tasks such as 3D reconstruction from 2D images requires larger datasets with multiple viewpoints. We introduce DeepPatent2, a large-scale dataset, providing more than 2.7 million technical drawings with 132,890 object names and 22,394 viewpoints extracted from 14 years of US design patent documents. We demonstrate the usefulness of DeepPatent2 with conceptual captioning. We further provide the potential usefulness of our dataset to facilitate other research areas such as 3D image reconstruction and image retrieval.

| S-EPMC10630310 | biostudies-literature

What Do North American Babies Hear? A large-scale cross-corpus analysis.

Project description:A range of demographic variables influences how much speech young children hear. However, because studies have used vastly different sampling methods, quantitative comparison of interlocking demographic effects has been nearly impossible, across or within studies. We harnessed a unique collection of existing naturalistic, day-long recordings from 61 homes across four North American cities to examine language input as a function of age, gender, and maternal education. We analyzed adult speech heard by 3- to 20-month-olds who wore audio recorders for an entire day. We annotated speaker gender and speech register (child-directed or adult-directed) for 10,861 utterances from female and male adults in these recordings. Examining age, gender, and maternal education collectively in this ecologically valid dataset, we find several key results. First, the speaker gender imbalance in the input is striking: children heard 2-3× more speech from females than males. Second, children in higher-maternal education homes heard more child-directed speech than those in lower-maternal education homes. Finally, our analyses revealed a previously unreported effect: the proportion of child-directed speech in the input increases with age, due to a decrease in adult-directed speech with age. This large-scale analysis is an important step forward in collectively examining demographic variables that influence early development, made possible by pooled, comparable, day-long recordings of children's language environments. The audio recordings, annotations, and annotation software are readily available for reuse and reanalysis by other researchers.

| S-EPMC6294666 | biostudies-literature

Caveat emptor, computational social science: Large-scale missing data in a widely-published Reddit corpus.

Project description:As researchers use computational methods to study complex social behaviors at scale, the validity of this computational social science depends on the integrity of the data. On July 2, 2015, Jason Baumgartner published a dataset advertised to include "every publicly available Reddit comment" which was quickly shared on Bittorrent and the Internet Archive. This data quickly became the basis of many academic papers on topics including machine learning, social behavior, politics, breaking news, and hate speech. We have discovered substantial gaps and limitations in this dataset which may contribute to bias in the findings of that research. In this paper, we document the dataset, substantial missing observations in the dataset, and the risks to research validity from those gaps. In summary, we identify strong risks to research that considers user histories or network analysis, moderate risks to research that compares counts of participation, and lesser risk to machine learning research that avoids making representative claims about behavior and participation on Reddit.

| S-EPMC6034852 | biostudies-literature

The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms.

Project description:Cough audio signal classification has been successfully used to diagnose a variety of respiratory conditions, and there has been significant interest in leveraging Machine Learning (ML) to provide widespread COVID-19 screening. The COUGHVID dataset provides over 25,000 crowdsourced cough recordings representing a wide range of participant ages, genders, geographic locations, and COVID-19 statuses. First, we contribute our open-sourced cough detection algorithm to the research community to assist in data robustness assessment. Second, four experienced physicians labeled more than 2,800 recordings to diagnose medical abnormalities present in the coughs, thereby contributing one of the largest expert-labeled cough datasets in existence that can be used for a plethora of cough audio classification tasks. Finally, we ensured that coughs labeled as symptomatic and COVID-19 originate from countries with high infection rates. As a result, the COUGHVID dataset contributes a wealth of cough recordings for training ML models to address the world's most urgent health crises.

| S-EPMC8222356 | biostudies-literature

The corpus callosum as anatomical marker of intelligence? A critical examination in a large-scale developmental study.

Project description:Intellectual abilities are supported by a large-scale fronto-parietal brain network distributed across both cerebral hemispheres. This bihemispheric network suggests a functional relevance of inter-hemispheric coordination, a notion which is supported by a series of recent structural magnetic resonance imaging (MRI) studies demonstrating correlations between intelligence scores (IQ) and corpus-callosum anatomy. However, these studies also reveal an age-related dissociation: mostly positive associations are reported in adult samples, while negative associations are found in developing samples. In the present study, we re-examine the association between corpus callosum and intelligence measures in a large (734 datasets from 495 participants) developmental mixed cross-sectional and longitudinal sample (6.4-21.9 years) using raw test scores rather than deviation IQ measures to account for the ongoing cognitive development in this age period. Analyzing mid-sagittal measures of regional callosal thickness, a positive association in the splenium of the corpus callosum was found for both verbal and performance raw test scores. This association was not present when the participants' age was considered in the analysis. Thus, we did not reveal any association that cannot be explained by a temporal co-occurrence of overall developmental trends in intellectual abilities and corpus callosum maturation in the present developing sample.

| S-EPMC5772147 | biostudies-literature

Large-scale mapping of mutations affecting zebrafish development.

Project description:BackgroundLarge-scale mutagenesis screens in the zebrafish employing the mutagen ENU have isolated several hundred mutant loci that represent putative developmental control genes. In order to realize the potential of such screens, systematic genetic mapping of the mutations is necessary. Here we report on a large-scale effort to map the mutations generated in mutagenesis screening at the Max Planck Institute for Developmental Biology by genome scanning with microsatellite markers.ResultsWe have selected a set of microsatellite markers and developed methods and scoring criteria suitable for efficient, high-throughput genome scanning. We have used these methods to successfully obtain a rough map position for 319 mutant loci from the Tübingen I mutagenesis screen and subsequent screening of the mutant collection. For 277 of these the corresponding gene is not yet identified. Mapping was successful for 80 % of the tested loci. By comparing 21 mutation and gene positions of cloned mutations we have validated the correctness of our linkage group assignments and estimated the standard error of our map positions to be approximately 6 cM.ConclusionBy obtaining rough map positions for over 300 zebrafish loci with developmental phenotypes, we have generated a dataset that will be useful not only for cloning of the affected genes, but also to suggest allelism of mutations with similar phenotypes that will be identified in future screens. Furthermore this work validates the usefulness of our methodology for rapid, systematic and inexpensive microsatellite mapping of zebrafish mutations.

| S-EPMC1781435 | biostudies-literature

Large scale analysis of apple development within HIVW apple progeny

Project description:Based on sensorial analysis, 8 apple genotypes with contrasted fruit texture for mealiness were selected among a progeny. Apple samples were collected at 60 days after flowering (60DAF), 110 days after flowering (110DAF), harvest (Rec), and after 1 or 2 months of cold storage (1M and 2M respectively).

2016-09-26 | GSE64079 | GEO

Creating a Corpus of Multilingual Parent-Child Speech Remotely: Lessons Learned in a Large-Scale Onscreen Picturebook Sharing Task.

Project description:With lockdowns and social distancing measures in place, research teams looking to collect naturalistic parent-child speech interactions have to develop alternatives to in-lab recordings and observational studies with long-stretch recordings. We designed a novel micro-longitudinal study, the Talk Together Study, which allowed us to create a rich corpus of parent-child speech interactions in a fully online environment (N participants = 142, N recordings = 410). In this paper, we discuss the methods we used, and the lessons learned during adapting and running the study. These lessons learned cover nine domains of research design, monitoring and feedback: Recruitment strategies, Surveys and Questionnaires, Video-call scheduling, Speech elicitation tools, Videocall protocols, Participant remuneration strategies, Project monitoring, Participant retention, and Data Quality, and may be used as a primer for teams planning to conduct remote studies in the future.

| S-EPMC8641650 | biostudies-literature

Development of large-scale functional brain networks in children.

Project description:The ontogeny of large-scale functional organization of the human brain is not well understood. Here we use network analysis of intrinsic functional connectivity to characterize the organization of brain networks in 23 children (ages 7-9 y) and 22 young-adults (ages 19-22 y). Comparison of network properties, including path-length, clustering-coefficient, hierarchy, and regional connectivity, revealed that although children and young-adults' brains have similar "small-world" organization at the global level, they differ significantly in hierarchical organization and interregional connectivity. We found that subcortical areas were more strongly connected with primary sensory, association, and paralimbic areas in children, whereas young-adults showed stronger cortico-cortical connectivity between paralimbic, limbic, and association areas. Further, combined analysis of functional connectivity with wiring distance measures derived from white-matter fiber tracking revealed that the development of large-scale brain networks is characterized by weakening of short-range functional connectivity and strengthening of long-range functional connectivity. Importantly, our findings show that the dynamic process of over-connectivity followed by pruning, which rewires connectivity at the neuronal level, also operates at the systems level, helping to reconfigure and rebalance subcortical and paralimbic connectivity in the developing brain. Our study demonstrates the usefulness of network analysis of brain connectivity to elucidate key principles underlying functional brain maturation, paving the way for novel studies of disrupted brain connectivity in neurodevelopmental disorders such as autism.

| S-EPMC2705656 | biostudies-literature

Assessing large-scale wildlife responses to human infrastructure development.

Project description:Habitat loss and deterioration represent the main threats to wildlife species, and are closely linked to the expansion of roads and human settlements. Unfortunately, large-scale effects of these structures remain generally overlooked. Here, we analyzed the European transportation infrastructure network and found that 50% of the continent is within 1.5 km of transportation infrastructure. We present a method for assessing the impacts from infrastructure on wildlife, based on functional response curves describing density reductions in birds and mammals (e.g., road-effect zones), and apply it to Spain as a case study. The imprint of infrastructure extends over most of the country (55.5% in the case of birds and 97.9% for mammals), with moderate declines predicted for birds (22.6% of individuals) and severe declines predicted for mammals (46.6%). Despite certain limitations, we suggest the approach proposed is widely applicable to the evaluation of effects of planned infrastructure developments under multiple scenarios, and propose an internationally coordinated strategy to update and improve it in the future.

| S-EPMC4968732 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data