Unknown

Dataset Information

0

SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop.


ABSTRACT:

Summary

Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing datasets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. To solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing datasets in a scalable and simple manner. SeqPigscripts use the Hadoop-based distributed scripting engine Apache Pig, which automatically parallelizes and distributes data processing tasks. We demonstrate SeqPig's scalability over many computing nodes and illustrate its use with example scripts.

Availability and implementation

Available under the open source MIT license at http://sourceforge.net/projects/seqpig/

SUBMITTER: Schumacher A 

PROVIDER: S-EPMC3866557 | biostudies-literature | 2014 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop.

Schumacher André A   Pireddu Luca L   Niemenmaa Matti M   Kallio Aleksi A   Korpelainen Eija E   Zanetti Gianluigi G   Heljanko Keijo K  

Bioinformatics (Oxford, England) 20131022 1


<h4>Summary</h4>Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing datasets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. To solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing datasets in a scalable and simple manner. SeqPigscripts use the Hadoop-based distribut  ...[more]

Similar Datasets

| S-EPMC7212266 | biostudies-literature
| S-EPMC4563723 | biostudies-literature
| S-EPMC6276899 | biostudies-literature
| S-EPMC8577285 | biostudies-literature
| S-EPMC5602011 | biostudies-literature
| S-EPMC7849385 | biostudies-literature
| S-EPMC2943614 | biostudies-literature
| S-EPMC3325791 | biostudies-other
| S-EPMC9385160 | biostudies-literature
| S-EPMC2375126 | biostudies-literature