Abstract
The storage, manipulation, and especially internet transfer of large amounts of data produced by High-Throughput Sequencing (HTS) instruments present major obstacles to utilizing the full potential of this promising technology. The current standard is based on storing all data, which are produced in text (FASTQ and FASTA) and often stored in binary (SRA and BAM) formats. To date, significant effort has been devoted to efficiently compressing these cumbersome sequencing data sets in their existing formats. However, given the substantial improvements in the quality of HTS data, we believe that if one can afford to exclude low quality data and read headers, new much more compressed data formats can be used to reduce the size of HTS data files by at least two orders of magnitude. Here we present several examples of file formats specifically designed to store only high quality sequencing reads in space efficient text and binary form.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 2015 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 1117-1122 |
Number of pages | 6 |
ISBN (Print) | 9781467367981 |
DOIs | |
State | Published - Dec 16 2015 |
Event | IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015 - Washington, United States Duration: Nov 9 2015 → Nov 12 2015 |
Other
Other | IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015 |
---|---|
Country/Territory | United States |
City | Washington |
Period | 11/9/15 → 11/12/15 |
Keywords
- File Formats
- HTS Data
- HTS File Converter
ASJC Scopus subject areas
- Software
- Artificial Intelligence
- Health Informatics
- Biomedical Engineering