Subtractive clustering analysis

A novel data mining method for finding cell subpopulations

Jacob N. Smith, Lisa Reece, Peter Szaniszlo, Leary, Rosemary C. Leary, James F. Leary

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

A novel data mining program called "subtractive clustering" picks out the most important differences between two or more flow cytometry listmode data files. While making no assumptions about the data, the program uses a variable weight and skew metric in the determination of bin size allowing for subtractive clustering of data without the need for bit-reduction or projection. In contrast, other subtraction methods, such as channel-by-channel subtraction, are dependent upon dimensionality and resolution, which can lead to an overestimation of positive cells because they do not account for the overall distribution of the test and control data sets. By taking into account human visual inspection of the data it is possible for the experimenter to choose an optimal subtraction by choosing an appropriate weight and skew metric, but without allowing direct modification of the results. By maximizing a bin size which can still differentiate clusters, it is possible to minimize computation while still removing data. The choice of control weight allows for different levels of bin destruction during the subtraction stage, the smaller the number the more conservative the subtraction, the larger, the more liberal. Three data sets illustrate full dimensional subtraction, single step biological data and multi-stage subtraction to show definitive test results. Subtractive clustering was able to conservatively remove control information leaving populations of interest. Subtractive clustering provides a powerful comparison of clusters and is a first step for finding non-obvious (hidden) differences and minimizing human prejudice during the analysis.

Original languageEnglish (US)
Article number51
Pages (from-to)354-361
Number of pages8
JournalUnknown Journal
Volume5699
DOIs
StatePublished - 2005

Fingerprint

Data Mining
Bins
Cluster Analysis
Data mining
Weights and Measures
Weight control
Flow cytometry
Information Storage and Retrieval
Inspection
Cells
Flow Cytometry
Population
Datasets

Keywords

  • Data mining
  • Exploratory data analysis
  • Flow cytometry
  • Subtractive clustering

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Smith, J. N., Reece, L., Szaniszlo, P., Leary, Leary, R. C., & Leary, J. F. (2005). Subtractive clustering analysis: A novel data mining method for finding cell subpopulations. Unknown Journal, 5699, 354-361. [51]. https://doi.org/10.1117/12.589463

Subtractive clustering analysis : A novel data mining method for finding cell subpopulations. / Smith, Jacob N.; Reece, Lisa; Szaniszlo, Peter; Leary; Leary, Rosemary C.; Leary, James F.

In: Unknown Journal, Vol. 5699, 51, 2005, p. 354-361.

Research output: Contribution to journalArticle

Smith, JN, Reece, L, Szaniszlo, P, Leary, Leary, RC & Leary, JF 2005, 'Subtractive clustering analysis: A novel data mining method for finding cell subpopulations', Unknown Journal, vol. 5699, 51, pp. 354-361. https://doi.org/10.1117/12.589463
Smith, Jacob N. ; Reece, Lisa ; Szaniszlo, Peter ; Leary ; Leary, Rosemary C. ; Leary, James F. / Subtractive clustering analysis : A novel data mining method for finding cell subpopulations. In: Unknown Journal. 2005 ; Vol. 5699. pp. 354-361.
@article{cc64450e0c67496cb76e74603aa2136a,
title = "Subtractive clustering analysis: A novel data mining method for finding cell subpopulations",
abstract = "A novel data mining program called {"}subtractive clustering{"} picks out the most important differences between two or more flow cytometry listmode data files. While making no assumptions about the data, the program uses a variable weight and skew metric in the determination of bin size allowing for subtractive clustering of data without the need for bit-reduction or projection. In contrast, other subtraction methods, such as channel-by-channel subtraction, are dependent upon dimensionality and resolution, which can lead to an overestimation of positive cells because they do not account for the overall distribution of the test and control data sets. By taking into account human visual inspection of the data it is possible for the experimenter to choose an optimal subtraction by choosing an appropriate weight and skew metric, but without allowing direct modification of the results. By maximizing a bin size which can still differentiate clusters, it is possible to minimize computation while still removing data. The choice of control weight allows for different levels of bin destruction during the subtraction stage, the smaller the number the more conservative the subtraction, the larger, the more liberal. Three data sets illustrate full dimensional subtraction, single step biological data and multi-stage subtraction to show definitive test results. Subtractive clustering was able to conservatively remove control information leaving populations of interest. Subtractive clustering provides a powerful comparison of clusters and is a first step for finding non-obvious (hidden) differences and minimizing human prejudice during the analysis.",
keywords = "Data mining, Exploratory data analysis, Flow cytometry, Subtractive clustering",
author = "Smith, {Jacob N.} and Lisa Reece and Peter Szaniszlo and Leary and Leary, {Rosemary C.} and Leary, {James F.}",
year = "2005",
doi = "10.1117/12.589463",
language = "English (US)",
volume = "5699",
pages = "354--361",
journal = "Molecular Oncology",
issn = "1574-7891",
publisher = "Elsevier",

}

TY - JOUR

T1 - Subtractive clustering analysis

T2 - A novel data mining method for finding cell subpopulations

AU - Smith, Jacob N.

AU - Reece, Lisa

AU - Szaniszlo, Peter

AU - Leary,

AU - Leary, Rosemary C.

AU - Leary, James F.

PY - 2005

Y1 - 2005

N2 - A novel data mining program called "subtractive clustering" picks out the most important differences between two or more flow cytometry listmode data files. While making no assumptions about the data, the program uses a variable weight and skew metric in the determination of bin size allowing for subtractive clustering of data without the need for bit-reduction or projection. In contrast, other subtraction methods, such as channel-by-channel subtraction, are dependent upon dimensionality and resolution, which can lead to an overestimation of positive cells because they do not account for the overall distribution of the test and control data sets. By taking into account human visual inspection of the data it is possible for the experimenter to choose an optimal subtraction by choosing an appropriate weight and skew metric, but without allowing direct modification of the results. By maximizing a bin size which can still differentiate clusters, it is possible to minimize computation while still removing data. The choice of control weight allows for different levels of bin destruction during the subtraction stage, the smaller the number the more conservative the subtraction, the larger, the more liberal. Three data sets illustrate full dimensional subtraction, single step biological data and multi-stage subtraction to show definitive test results. Subtractive clustering was able to conservatively remove control information leaving populations of interest. Subtractive clustering provides a powerful comparison of clusters and is a first step for finding non-obvious (hidden) differences and minimizing human prejudice during the analysis.

AB - A novel data mining program called "subtractive clustering" picks out the most important differences between two or more flow cytometry listmode data files. While making no assumptions about the data, the program uses a variable weight and skew metric in the determination of bin size allowing for subtractive clustering of data without the need for bit-reduction or projection. In contrast, other subtraction methods, such as channel-by-channel subtraction, are dependent upon dimensionality and resolution, which can lead to an overestimation of positive cells because they do not account for the overall distribution of the test and control data sets. By taking into account human visual inspection of the data it is possible for the experimenter to choose an optimal subtraction by choosing an appropriate weight and skew metric, but without allowing direct modification of the results. By maximizing a bin size which can still differentiate clusters, it is possible to minimize computation while still removing data. The choice of control weight allows for different levels of bin destruction during the subtraction stage, the smaller the number the more conservative the subtraction, the larger, the more liberal. Three data sets illustrate full dimensional subtraction, single step biological data and multi-stage subtraction to show definitive test results. Subtractive clustering was able to conservatively remove control information leaving populations of interest. Subtractive clustering provides a powerful comparison of clusters and is a first step for finding non-obvious (hidden) differences and minimizing human prejudice during the analysis.

KW - Data mining

KW - Exploratory data analysis

KW - Flow cytometry

KW - Subtractive clustering

UR - http://www.scopus.com/inward/record.url?scp=21844443559&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=21844443559&partnerID=8YFLogxK

U2 - 10.1117/12.589463

DO - 10.1117/12.589463

M3 - Article

VL - 5699

SP - 354

EP - 361

JO - Molecular Oncology

JF - Molecular Oncology

SN - 1574-7891

M1 - 51

ER -