Problems with golubEsets dataset
0
0
Entering edit mode
王魏强 ▴ 10
@-493
Last seen 10.5 years ago
Dear list: When I begin to analysis the golubEsets dataset and make a simple pre-processing step,I find a strange phenomena. The pre-processing steps follows the suggestion of S. Dudoit et al.(2002, JASA,personal communication with Pablo Tamayo):(1) thresholding: floor of 100 and ceiling of 16000; (ii) filetering: exclusion of genes with max/min<=5 and (max-min)<=500, where max and min refer respectively to the maximum and minimum expression levels of a particular gene across mRNA samples;(iii) base 10 logarithmic transformation. If only pre-processing with thresholding,the dataset are summarized by a 7129*72 matrix, where there are 4260(0.784%) with values 16000,242087(47.164%) with values 100, totally 47.948%. If pre-processing with thresholding & filtering, the dataset are summarized by a 3571*72 matrix, where there are 987(0.384%) with values 16000, 50321(19.572%) with values 100, totally 19.956%. I wonder whether we can get some interesting expression pattern from such noisy dataset. I have written to the original author of the datasets,but unfortunately he cann't give me a good reason. I write this letter to the Bioconductor list to see if someone could give me a explanation. Waiting for reply! Wang Weiqiang         cinderole@sina.com           2003-10-26
• 695 views
ADD COMMENT

Login before adding your answer.

Traffic: 554 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6