I want to do an analysis of differentially expressed genes between tumor and normals. I was hoping to start from the expression data from TCGA_PANCAN_exp_HiSeqV2_PANCAN . This contains 8415 tumor and normal data, across 27 tumor types, and I was envisioning doing some sort of anova-like test to find the set of genes differentially expressed in tumors (maybe using edgeR). However, a lot of processing went into this data: apparently each sample was: RSEM expected counts; normalized to its 75th percentile; log2(x+1) transformed; normalized between cohorts (at the very least). So it is not the count data I've used before. Looking at the mean-variance plot (meanSdPlot(expr, ranks=FALSE), ), there's quite a strong pattern (https://dl.dropboxusercontent.com/u/10824188/Screen%20Shot%202015-01-07%20at%2011.01.52%20AM.png). The genes near zero ("average expression" I believe) have the lowest variance. Is this data appropriate for my purpose? Is there any transformation I could do?
The RSEM expected counts from the TCGA project will work fine with either limma-voom or edgeR. However, with such a large number of samples, limma-voom is easily the best choice from a computational point of view. (Note I mean voom, not vooma.)
None of the other data columns are usable and you must not do any data transformation.
The two mean-variance plots that you give (from meanSdPlot and vooma) look very bad indeed, nonsense really. There is no way that you should be getting a v-shape on zero as in these plots. You don't say what expression values or what design matrix you used to make these plots but, however it has been done, it looks incorrect.
If you want to use edgeR, the original counts would be the most ideal input. You might be able to get away with "expected" counts, but once you start manipulating them with log-transformations and normalization, they're not going to be interpretable as counts anymore. A simple reversal of the log-transformation is not sufficient, as the normalization steps will have distorted the absolute size of the resulting values (which will affect the mean-variance relationship in edgeR's statistical model).
If you can only get access to the log-values, you might want to look into the vooma function from the limma package. This will estimate the mean-variance relationship in order to compute observation-specific precision weights. These weights can then be used for linear modelling of the log-values. That said, the mean-variance relationship that you've shown is quite bizarre and might interfere with proper modelling. I suspect that this is a result of one of the normalization steps, though I'm not familiar enough with TCGA processing to say that with any certainty.