Weighted neighborhood classifier for the classification of imbalanced tumor dataset

Shu Lin Wang, Xueling Li, Jun Feng Xia, Xiao Ping Zhang

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Machine learning is widely applied to gene expression profiles based molecular tumor classification, but sample imbalance problem is often overlooked. This paper proposed a subclass-weighted neighborhood classifier to address the imbalanced sample set problem and a novel neighborhood rough set model to select informative genes for classification performance improvement. Experiments on three publicly available tumor datasets demonstrated that the proposed method is obviously effective on imbalanced dataset with obscure boundary between two subtypes and informative gene selection and it can achieve higher cross-validation accuracy with much fewer tumor-related genes.

Original languageEnglish (US)
Pages (from-to)259-273
Number of pages15
JournalJournal of Circuits, Systems and Computers
Volume19
Issue number1
DOIs
StatePublished - Feb 2010
Externally publishedYes

Fingerprint

Tumors
Classifiers
Genes
Gene expression
Learning systems
Experiments

Keywords

  • Gene expression profiles
  • Imbalanced dataset
  • Kruskal-Wallis rank sum test
  • Molecular tumor classification
  • Neighborhood rough set model
  • Weighted neighborhood classifier

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Hardware and Architecture

Cite this

Weighted neighborhood classifier for the classification of imbalanced tumor dataset. / Wang, Shu Lin; Li, Xueling; Xia, Jun Feng; Zhang, Xiao Ping.

In: Journal of Circuits, Systems and Computers, Vol. 19, No. 1, 02.2010, p. 259-273.

Research output: Contribution to journalArticle

Wang, Shu Lin ; Li, Xueling ; Xia, Jun Feng ; Zhang, Xiao Ping. / Weighted neighborhood classifier for the classification of imbalanced tumor dataset. In: Journal of Circuits, Systems and Computers. 2010 ; Vol. 19, No. 1. pp. 259-273.
@article{254bcdb2a9e446aea868a3f0ae1d520c,
title = "Weighted neighborhood classifier for the classification of imbalanced tumor dataset",
abstract = "Machine learning is widely applied to gene expression profiles based molecular tumor classification, but sample imbalance problem is often overlooked. This paper proposed a subclass-weighted neighborhood classifier to address the imbalanced sample set problem and a novel neighborhood rough set model to select informative genes for classification performance improvement. Experiments on three publicly available tumor datasets demonstrated that the proposed method is obviously effective on imbalanced dataset with obscure boundary between two subtypes and informative gene selection and it can achieve higher cross-validation accuracy with much fewer tumor-related genes.",
keywords = "Gene expression profiles, Imbalanced dataset, Kruskal-Wallis rank sum test, Molecular tumor classification, Neighborhood rough set model, Weighted neighborhood classifier",
author = "Wang, {Shu Lin} and Xueling Li and Xia, {Jun Feng} and Zhang, {Xiao Ping}",
year = "2010",
month = "2",
doi = "10.1142/S0218126610006232",
language = "English (US)",
volume = "19",
pages = "259--273",
journal = "Journal of Circuits, Systems and Computers",
issn = "0218-1266",
publisher = "World Scientific Publishing Co. Pte Ltd",
number = "1",

}

TY - JOUR

T1 - Weighted neighborhood classifier for the classification of imbalanced tumor dataset

AU - Wang, Shu Lin

AU - Li, Xueling

AU - Xia, Jun Feng

AU - Zhang, Xiao Ping

PY - 2010/2

Y1 - 2010/2

N2 - Machine learning is widely applied to gene expression profiles based molecular tumor classification, but sample imbalance problem is often overlooked. This paper proposed a subclass-weighted neighborhood classifier to address the imbalanced sample set problem and a novel neighborhood rough set model to select informative genes for classification performance improvement. Experiments on three publicly available tumor datasets demonstrated that the proposed method is obviously effective on imbalanced dataset with obscure boundary between two subtypes and informative gene selection and it can achieve higher cross-validation accuracy with much fewer tumor-related genes.

AB - Machine learning is widely applied to gene expression profiles based molecular tumor classification, but sample imbalance problem is often overlooked. This paper proposed a subclass-weighted neighborhood classifier to address the imbalanced sample set problem and a novel neighborhood rough set model to select informative genes for classification performance improvement. Experiments on three publicly available tumor datasets demonstrated that the proposed method is obviously effective on imbalanced dataset with obscure boundary between two subtypes and informative gene selection and it can achieve higher cross-validation accuracy with much fewer tumor-related genes.

KW - Gene expression profiles

KW - Imbalanced dataset

KW - Kruskal-Wallis rank sum test

KW - Molecular tumor classification

KW - Neighborhood rough set model

KW - Weighted neighborhood classifier

UR - http://www.scopus.com/inward/record.url?scp=77951570192&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77951570192&partnerID=8YFLogxK

U2 - 10.1142/S0218126610006232

DO - 10.1142/S0218126610006232

M3 - Article

AN - SCOPUS:77951570192

VL - 19

SP - 259

EP - 273

JO - Journal of Circuits, Systems and Computers

JF - Journal of Circuits, Systems and Computers

SN - 0218-1266

IS - 1

ER -