Original Articles |
|
|
|
|
Data Preprocessing in Cluster Analysis of Gene Expression |
YANG Chun-Mei1;WAN Bai-Kun1;GAO Xiao-Feng2 |
1College of Precision Instrument and Opto-Electronics Engineering, Tianjin University, Tianjin 300072
2Motorola (China) Electronics Ltd., Tianjin 300457
|
|
Cite this article: |
YANG Chun-Mei, WAN Bai-Kun, GAO Xiao-Feng 2003 Chin. Phys. Lett. 20 774-777 |
|
|
Abstract Considering that the DNA microarray technology has generated explosive gene expression data and that it is urgent to analyse and to visualize such massive datasets with efficient methods, we investigate the data preprocessing methods used in cluster analysis, normalization or logarithm of the matrix, by using hierarchical clustering, principle component analysis (PCA) and self-organizing maps (SOMs). The results illustrate that when using the Euclidean distance as measuring metrices, logarithm of relative expression level is the best preprocessing method, while data preprocessed by normalization cannot attain the expected results because the data structure is ruined. If there are only a few principle components, the PCA is an effective method to extract the frame structure, while SOMs are more suitable for a specific structure.
|
Keywords:
87.80.Tq
07.05.Kf
89.70.+c
87.14.Gg
|
|
Published: 01 May 2003
|
|
|
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|