Abstract: Considering that the DNA microarray technology has generated explosive gene expression data and that it is urgent to analyse and to visualize such massive datasets with efficient methods, we investigate the data preprocessing methods used in cluster analysis, normalization or logarithm of the matrix, by using hierarchical clustering, principle component analysis (PCA) and self-organizing maps (SOMs). The results illustrate that when using the Euclidean distance as measuring metrices, logarithm of relative expression level is the best preprocessing method, while data preprocessed by normalization cannot attain the expected results because the data structure is ruined. If there are only a few principle components, the PCA is an effective method to extract the frame structure, while SOMs are more suitable for a specific structure.
YANG Chun-Mei;WAN Bai-Kun;GAO Xiao-Feng. Data Preprocessing in Cluster Analysis of Gene Expression[J]. 中国物理快报, 2003, 20(5): 774-777.
YANG Chun-Mei, WAN Bai-Kun, GAO Xiao-Feng. Data Preprocessing in Cluster Analysis of Gene Expression. Chin. Phys. Lett., 2003, 20(5): 774-777.