Dear R people,
I want to do an analysis of differentially expressed genes between tumor and normals. I was hoping to start from the expression data from TCGA_PANCAN_exp_HiSeqV2_PANCAN . This contains 8415 tumor and normal data, across 27 tumor types, and I was envisioning doing some sort of anova-like test to find the set of genes differentially expressed in tumors (maybe using edgeR). However, a lot of processing went into this data: apparently each sample was: RSEM expected counts; normalized to its 75th percentile; log2(x+1) transformed; normalized between cohorts (at the very least). So it is not the count data I've used before. Looking at the mean-variance plot (meanSdPlot(expr, ranks=FALSE), ), there's quite a strong pattern (https://dl.dropboxusercontent.com/u/10824188/Screen%20Shot%202015-01-07%20at%2011.01.52%20AM.png). The genes near zero ("average expression" I believe) have the lowest variance. Is this data appropriate for my purpose? Is there any transformation I could do?
Thank you very much,