Slim-Filter

An interactive windows-based application for illumina genome analyzer data assessment and manipulation

George Golovko, Kamil Khanipov, Mark Rojas, Antonio Martinez-Alcántara, Jesse J. Howard, Efren Ballesteros, Sharu Gupta, William Widger, Yuriy Fofanov

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Background: The emergence of Next Generation Sequencing technologies has made it possible for individual investigators to generate gigabases of sequencing data per week. Effective analysis and manipulation of these data is limited due to large file sizes, so even simple tasks such as data filtration and quality assessment have to be performed in several steps. This requires (potentially problematic) interaction between the investigator and a bioinformatics/computational service provider. Furthermore, such services are often performed using specialized computational facilities.Results: We present a Windows-based application, Slim-Filter designed to interactively examine the statistical properties of sequencing reads produced by Illumina Genome Analyzer and to perform a broad spectrum of data manipulation tasks including: filtration of low quality and low complexity reads; filtration of reads containing undesired subsequences (such as parts of adapters and PCR primers used during the sample and sequencing libraries preparation steps); excluding duplicated reads (while keeping each read's copy number information in a specialized data format); and sorting reads by copy numbers allowing for easy access and manual editing of the resulting files. Slim-Filter is organized as a sequence of windows summarizing the statistical properties of the reads. Each data manipulation step has roll-back abilities, allowing for return to previous steps of the data analysis process. Slim-Filter is written in C++ and is compatible with fasta, fastq, and specialized AS file formats presented in this manuscript. Setup files and a user's manual are available for download at the supplementary web site (https://www.bioinfo.uh.edu/Slim_Filter/).Conclusion: The presented Windows-based application has been developed with the goal of providing individual investigators with integrated sequencing reads analysis, curation, and manipulation capabilities.

Original languageEnglish (US)
Article number166
JournalBMC Bioinformatics
Volume13
Issue number1
DOIs
StatePublished - Jul 16 2012
Externally publishedYes

Fingerprint

Manipulation
Genome
Genes
Sequencing
Research Personnel
Filter
Bioinformatics
Sorting
Filtration
Websites
Computational Biology
Statistical property
Technology
Polymerase Chain Reaction
Quality Assessment
C++
Subsequence
Low Complexity
Preparation
Data analysis

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics
  • Structural Biology

Cite this

Slim-Filter : An interactive windows-based application for illumina genome analyzer data assessment and manipulation. / Golovko, George; Khanipov, Kamil; Rojas, Mark; Martinez-Alcántara, Antonio; Howard, Jesse J.; Ballesteros, Efren; Gupta, Sharu; Widger, William; Fofanov, Yuriy.

In: BMC Bioinformatics, Vol. 13, No. 1, 166, 16.07.2012.

Research output: Contribution to journalArticle

Golovko, George ; Khanipov, Kamil ; Rojas, Mark ; Martinez-Alcántara, Antonio ; Howard, Jesse J. ; Ballesteros, Efren ; Gupta, Sharu ; Widger, William ; Fofanov, Yuriy. / Slim-Filter : An interactive windows-based application for illumina genome analyzer data assessment and manipulation. In: BMC Bioinformatics. 2012 ; Vol. 13, No. 1.
@article{5e51f9e2895e4b2fb876891f715eb4ea,
title = "Slim-Filter: An interactive windows-based application for illumina genome analyzer data assessment and manipulation",
abstract = "Background: The emergence of Next Generation Sequencing technologies has made it possible for individual investigators to generate gigabases of sequencing data per week. Effective analysis and manipulation of these data is limited due to large file sizes, so even simple tasks such as data filtration and quality assessment have to be performed in several steps. This requires (potentially problematic) interaction between the investigator and a bioinformatics/computational service provider. Furthermore, such services are often performed using specialized computational facilities.Results: We present a Windows-based application, Slim-Filter designed to interactively examine the statistical properties of sequencing reads produced by Illumina Genome Analyzer and to perform a broad spectrum of data manipulation tasks including: filtration of low quality and low complexity reads; filtration of reads containing undesired subsequences (such as parts of adapters and PCR primers used during the sample and sequencing libraries preparation steps); excluding duplicated reads (while keeping each read's copy number information in a specialized data format); and sorting reads by copy numbers allowing for easy access and manual editing of the resulting files. Slim-Filter is organized as a sequence of windows summarizing the statistical properties of the reads. Each data manipulation step has roll-back abilities, allowing for return to previous steps of the data analysis process. Slim-Filter is written in C++ and is compatible with fasta, fastq, and specialized AS file formats presented in this manuscript. Setup files and a user's manual are available for download at the supplementary web site (https://www.bioinfo.uh.edu/Slim_Filter/).Conclusion: The presented Windows-based application has been developed with the goal of providing individual investigators with integrated sequencing reads analysis, curation, and manipulation capabilities.",
author = "George Golovko and Kamil Khanipov and Mark Rojas and Antonio Martinez-Alc{\'a}ntara and Howard, {Jesse J.} and Efren Ballesteros and Sharu Gupta and William Widger and Yuriy Fofanov",
year = "2012",
month = "7",
day = "16",
doi = "10.1186/1471-2105-13-166",
language = "English (US)",
volume = "13",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Slim-Filter

T2 - An interactive windows-based application for illumina genome analyzer data assessment and manipulation

AU - Golovko, George

AU - Khanipov, Kamil

AU - Rojas, Mark

AU - Martinez-Alcántara, Antonio

AU - Howard, Jesse J.

AU - Ballesteros, Efren

AU - Gupta, Sharu

AU - Widger, William

AU - Fofanov, Yuriy

PY - 2012/7/16

Y1 - 2012/7/16

N2 - Background: The emergence of Next Generation Sequencing technologies has made it possible for individual investigators to generate gigabases of sequencing data per week. Effective analysis and manipulation of these data is limited due to large file sizes, so even simple tasks such as data filtration and quality assessment have to be performed in several steps. This requires (potentially problematic) interaction between the investigator and a bioinformatics/computational service provider. Furthermore, such services are often performed using specialized computational facilities.Results: We present a Windows-based application, Slim-Filter designed to interactively examine the statistical properties of sequencing reads produced by Illumina Genome Analyzer and to perform a broad spectrum of data manipulation tasks including: filtration of low quality and low complexity reads; filtration of reads containing undesired subsequences (such as parts of adapters and PCR primers used during the sample and sequencing libraries preparation steps); excluding duplicated reads (while keeping each read's copy number information in a specialized data format); and sorting reads by copy numbers allowing for easy access and manual editing of the resulting files. Slim-Filter is organized as a sequence of windows summarizing the statistical properties of the reads. Each data manipulation step has roll-back abilities, allowing for return to previous steps of the data analysis process. Slim-Filter is written in C++ and is compatible with fasta, fastq, and specialized AS file formats presented in this manuscript. Setup files and a user's manual are available for download at the supplementary web site (https://www.bioinfo.uh.edu/Slim_Filter/).Conclusion: The presented Windows-based application has been developed with the goal of providing individual investigators with integrated sequencing reads analysis, curation, and manipulation capabilities.

AB - Background: The emergence of Next Generation Sequencing technologies has made it possible for individual investigators to generate gigabases of sequencing data per week. Effective analysis and manipulation of these data is limited due to large file sizes, so even simple tasks such as data filtration and quality assessment have to be performed in several steps. This requires (potentially problematic) interaction between the investigator and a bioinformatics/computational service provider. Furthermore, such services are often performed using specialized computational facilities.Results: We present a Windows-based application, Slim-Filter designed to interactively examine the statistical properties of sequencing reads produced by Illumina Genome Analyzer and to perform a broad spectrum of data manipulation tasks including: filtration of low quality and low complexity reads; filtration of reads containing undesired subsequences (such as parts of adapters and PCR primers used during the sample and sequencing libraries preparation steps); excluding duplicated reads (while keeping each read's copy number information in a specialized data format); and sorting reads by copy numbers allowing for easy access and manual editing of the resulting files. Slim-Filter is organized as a sequence of windows summarizing the statistical properties of the reads. Each data manipulation step has roll-back abilities, allowing for return to previous steps of the data analysis process. Slim-Filter is written in C++ and is compatible with fasta, fastq, and specialized AS file formats presented in this manuscript. Setup files and a user's manual are available for download at the supplementary web site (https://www.bioinfo.uh.edu/Slim_Filter/).Conclusion: The presented Windows-based application has been developed with the goal of providing individual investigators with integrated sequencing reads analysis, curation, and manipulation capabilities.

UR - http://www.scopus.com/inward/record.url?scp=84870007007&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84870007007&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-13-166

DO - 10.1186/1471-2105-13-166

M3 - Article

VL - 13

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 166

ER -