Correct assumptions of using limma moderated t-test
2
3
Entering edit mode
svlachavas ▴ 800
@svlachavas-7225
Last seen 3 days ago
Germany/Heidelberg/German Cancer Resear…

Dear Bioconductor Community,

as I'm currently writting a report about my gene expression analysis on two affymetrix microarray datasets, regarding differential expression, i found an paper (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0012336) mentioning the necessary assumptions and "possible requirements" in order to use different categories of statistical tests. In my case, i performed paired limma moderated t-test to check for gene expression alterations between cancer and control samples in each patient comprized the 2 above mentioned datasets, but as im "fresh in R" i have checked the normality of my data(boxplots,histograms,Q-Q plots after normalization), but i havent cheched for equal variance !! As i have read from the above paper limma is a homoscedastic test(thus makes the assumption for equal variances between the groups of interest) could i have violated my results regarding the false positive rate ? Or due to the paired nature of my analysis(and thus not generally two group comparison) this does not affect my study ?

Finally, if this pinpoint a great concern, how could i check in R and in both datasets prior of using limma the homoscedasticity of my data(test for unequal variances) regarding the two groups(cancer and control samples) ?

Any advice or consideration on this matter would be grateful !!

13
Entering edit mode
@gordon-smyth
Last seen 5 hours ago
WEHI, Melbourne, Australia

As Ryan has explained to you, limma does now allow you to formally allow for unequal variances between groups, usually a higher variance in the cancer group. You do this by running arrayWeights() with var.design indicating the groups before you run lmFit() and eBayes(). However I want to explain to you why this is usually not necessary.

Unequal variances are not usually a major concern in microarray data. The two-sample t-test is known to be highly robust against unequal variances, and the limma moderated t-test inherits this property. Markowski and Markowski (American Statistician, 1990) concluded that

"the t test is insensitive to variance heterogeneity and hence no preliminary test [for unequal variances] is necessary."

George Box (Biometrika, 1953) put it more colorfully. He said

“To make the preliminary test on variances is rather like putting to sea in a rowing boat to find out whether conditions are sufficiently calm for an ocean liner to leave port!”

The main cost of unequal variances is loss of power rather than failure to control the error rate. If the variances truly are unequal between your control and cancer groups, then default limma may be a bit conservative, but usually it will still be controlling the FDR correctly. This is not usually something to worry about. The paper that you cite (Jeanmougin et al, PLOSONE 2010) found limma to be very robust and did not recommend that you need to check for homoscedasticity.

You should of course do an MDS plot of the samples using plotMDS in limma. Do the cancer samples look more dispersed than the normal samples? If so then you can probably increase power to detect more DE by using the arrayWeights function to estimate the heteroscedasticity.

My guess is that you already have a huge number of DE genes between normal and cancer. Do you need even more? I would personally adjust for heteroscedasticity only when the cancers appear very much more variable than the control samples.

Alternatively you could run arrayWeights without var.design if there are individual samples that appear to be outliers. Again, this tends to increase power to detect DE. This is the most common use of the arrayWeights function.

6
Entering edit mode
@ryan-c-thompson-5618
Last seen 22 months ago
Scripps Research, La Jolla, CA

By default, limma is fitting a homoskedastic model -- it is assuming equal variance across all groups. And my understanding is that a homoskedastic test on heteroskedastic data will produce excess false positives (Edit: See Gordon's answer for a more informed opinion on this). But recent updates to limma have added the capability to fit heteroskedastic models by computing weights for each group of samples or each sample individually. I believe the relevant function is called arrayWeights.

0
Entering edit mode

Dear Mr Ryan C.Thompson,

thank you for your suggestion. Besides the above implementation in limma which i will search, is there also another test to check for homo- or heteroscedasticity prior to limma testing ?

0
Entering edit mode

Formal tests of homoscedasticity are not a good idea, because such tests make assumptions of their own that are stronger than the original assumptions you are trying to check (see Box's quote about row boats and ocean liners).  Instead, just look at the MDS plot.