Entering edit mode
王魏强
▴
10
@-493
Last seen 10.5 years ago
Dear list:
When I begin to analysis the golubEsets dataset and make a
simple pre-processing step,I find a strange phenomena.
The pre-processing steps follows the suggestion of S. Dudoit
et al.(2002, JASA,personal communication with Pablo Tamayo):(1)
thresholding: floor of 100 and ceiling of 16000; (ii) filetering:
exclusion of genes with max/min<=5 and (max-min)<=500, where max and
min refer respectively to the maximum and minimum expression levels of
a particular gene across mRNA samples;(iii) base 10 logarithmic
transformation.
If only pre-processing with thresholding,the dataset are
summarized by a 7129*72 matrix, where there are 4260(0.784%) with
values 16000,242087(47.164%) with values 100, totally 47.948%.
If pre-processing with thresholding & filtering, the dataset
are summarized by a 3571*72 matrix, where there are 987(0.384%) with
values 16000, 50321(19.572%) with values 100, totally 19.956%.
I wonder whether we can get some interesting expression
pattern from such noisy dataset. I have written to the original author
of the datasets,but unfortunately he cann't give me a good reason. I
write this letter to the Bioconductor list to see if someone could
give me a explanation.
Waiting for reply!
Wang Weiqiang
cinderole@sina.com
2003-10-26