Question

Problems with golubEsets dataset

0

Entering edit mode

王魏强 ▴ 10

@-493

Last seen 10.5 years ago

Dear list: When I begin to analysis the golubEsets dataset and make a simple pre-processing step,I find a strange phenomena. The pre-processing steps follows the suggestion of S. Dudoit et al.(2002, JASA,personal communication with Pablo Tamayo):(1) thresholding: floor of 100 and ceiling of 16000; (ii) filetering: exclusion of genes with max/min<=5 and (max-min)<=500, where max and min refer respectively to the maximum and minimum expression levels of a particular gene across mRNA samples;(iii) base 10 logarithmic transformation. If only pre-processing with thresholding,the dataset are summarized by a 7129*72 matrix, where there are 4260(0.784%) with values 16000,242087(47.164%) with values 100, totally 47.948%. If pre-processing with thresholding & filtering, the dataset are summarized by a 3571*72 matrix, where there are 987(0.384%) with values 16000, 50321(19.572%) with values 100, totally 19.956%. I wonder whether we can get some interesting expression pattern from such noisy dataset. I have written to the original author of the datasets,but unfortunately he cann't give me a good reason. I write this letter to the Bioconductor list to see if someone could give me a explanation. Waiting for reply! Wang Weiqiang 　　　　　　　　cinderole@sina.com 　　　　　　　　　　2003-10-26

• 695 views

ADD COMMENT • link 21.4 years ago 王魏强 ▴ 10