Dataset Information

Storing, combining and analysing turkey experimental data in the Big Data era.

ABSTRACT: With the increasing availability of large amounts of data in the livestock domain, we face the challenge to store, combine and analyse these data efficiently. With this study, we explored the use of a data lake for storing and analysing data to improve scalability and interoperability. Data originated from a 2-day animal experiment in which the gait score of approximately 200 turkeys was determined through visual inspection by an expert. Additionally, inertial measurement units (IMUs), a 3D-video camera and a force plate (FP) were installed to explore the effectiveness of these sensors in automating the visual gait scoring. We deployed a data lake using the IMU and FP data of a single day of that animal experiment. This encompasses data from 84 turkeys for which we preprocessed by performing an 'extract, transform and load' (ETL-) procedure. To test scalability of the ETL-procedure, we simulated increasing volumes of the available data from this animal experiment and computed the 'wall time' (elapsed real time) for converting FP data into comma-separated files and storing these files. With a simulated data set of 30 000 turkeys, the wall time reduced from 1 h to less than 15 min, when 12 cores were used compared to 1 core. This demonstrated the ETL-procedure to be scalable. Subsequently, a machine learning (ML) pipeline was developed to test the potential of a data lake to automatically distinguish between two classses, that is, very bad gait scores v. other scores. In conclusion, we have set up a dedicated customized data lake, loaded data and developed a prediction model via the creation of an ML pipeline. A data lake appears to be a useful tool to face the challenge of storing, combining and analysing increasing volumes of data of varying nature in an effective manner.

SUBMITTER: Schokker D

PROVIDER: S-EPMC7538337 | biostudies-literature | 2020 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Storing, combining and analysing turkey experimental data in the Big Data era.

Schokker D D Athanasiadis I N IN Visser B B Veerkamp R F RF Kamphuis C C

Animal : an international journal of animal bioscience 20200622 11

With the increasing availability of large amounts of data in the livestock domain, we face the challenge to store, combine and analyse these data efficiently. With this study, we explored the use of a data lake for storing and analysing data to improve scalability and interoperability. Data originated from a 2-day animal experiment in which the gait score of approximately 200 turkeys was determined through visual inspection by an expert. Additionally, inertial measurement units (IMUs), a 3D-vide ...[more]

PMID: 32624081

Dataset Information

Storing, combining and analysing turkey experimental data in the Big Data era.

Publications

Storing, combining and analysing turkey experimental data in the Big Data era.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Ewe: a web-based ethnobotanical database for storing and analysing data.
| S-EPMC7015817 | biostudies-literature

SYMPOSIUM - ICPRP 2019-ERA OF BIG DATA
| S-EPMC8021051 | biostudies-literature

References for Haplotype Imputation in the Big Data Era.
| S-EPMC4888899 | biostudies-literature

The natverse, a versatile toolbox for combining and analysing neuroanatomical data.
| S-EPMC7242028 | biostudies-literature

A new tool called DISSECT for analysing large genomic data sets using a Big Data approach.
| S-EPMC4682108 | biostudies-literature

Renewing Felsenstein's phylogenetic bootstrap in the era of big data.
| S-EPMC6030568 | biostudies-literature

Antibody-Antigen Binding Interface Analysis in the Big Data Era.
| S-EPMC9329859 | biostudies-literature

Managing, analysing, and integrating big data in medical bioinformatics: open problems and future perspectives.
| S-EPMC4165507 | biostudies-literature

Principles of Experimental Design for Big Data Analysis.
| S-EPMC5584669 | biostudies-literature

Relative specificity as an important consideration in the big data era.
| S-EPMC9582229 | biostudies-literature