Proteomics

Dataset Information

0

Aird: A computation-oriented mass spectrometry data format enables a higher compression ratio and less decoding time


ABSTRACT: We describe "Aird", an opensource and computation-oriented format with controllable precision, flexible indexing strategies, and high compression rate. Aird provides a novel compressor called Zlib-Diff-PforDelta (ZDPD) for m/z data. Compared with Zlib only, m/z data size is about 55% lower in Aird on average. With the high-speed decoding and encoding performance brought by the Single Instruction Multiple Data(SIMD) technology used in the ZDPD, Aird merely takes 33% decoding time compared with Zlib. We used the open dataset HYE, which contains 48 raw files from SCIEX TripleTOF 5600 and TripleTOF6600. The total file size is 206GB as the vendor format. The total size increases to 854GB after converting to mzML with 32-bit encoding precision. While it takes only 189GB when using Aird. Aird uses JavaScript Object Notation (JSON) for metadata storage. Aird-SDK is written in Java and AirdPro is a GUI client for vendor file converting which is written in C#. They are freely available at https://github.com/CSi-Studio/Aird-SDK and https://github.com/CSi-Studio/AirdPro.

INSTRUMENT(S): Q Exactive HF, Q Exactive

ORGANISM(S): Homo Sapiens (human)

TISSUE(S): Cell Culture

SUBMITTER: cong xie  

LAB HEAD: Miaoshan Lu

PROVIDER: PXD025142 | Pride | 2021-04-12

REPOSITORIES: Pride

Dataset's files

Source:
Action DRS
MTBLS2119_raw.zip Other
Negative_000333.aird Other
Negative_000333.json Other
Negative_000333.mgf Mgf
Negative_000333.mz5 Other
Items per page:
1 - 5 of 41

Similar Datasets

2015-04-25 | E-GEOD-57108 | biostudies-arrayexpress
2021-04-12 | PXD025310 |
| PRJNA949416 | ENA
2021-07-05 | PXD023449 | Pride
2013-09-05 | E-GEOD-50609 | biostudies-arrayexpress
2021-05-03 | PXD006191 | Pride
2014-07-30 | PXD000249 | Pride
2019-06-24 | GSE129576 | GEO
2019-06-24 | GSE129575 | GEO
2015-02-27 | E-MTAB-3368 | biostudies-arrayexpress