Hello dear community,
There has been a similar post: "modeling heteroscedasticity in limma-voom for RNA-Seq data analysis" but the topic was different.
I have proteomics data over time, so every row of my matrix is a protein, and the columns are timepoints and replicates. Additionally, I have two conditions (groups) which are growth phases of the cells in the bioreactor (exponential and stationary). Column 1-18 consist of all the samples from the exponential condition, and column 19-36 consist of the samples of the stationary condition. Column 1 is timepoint 1 (T1) replicate 1 (R1), column 2 is T1-R2, column 3 is T1-R3, column 4 is T2-R1, and so on. So I have three replicates per timepoint. And this starts anew at column 19, at the condition stationary, so there I have the exact same sampling scheme.
I model the proteins over time with splines, and I use limma for that. However, do increase statistical power, and to also be able to test for interactions between the conditions, I model the two conditions together. For this, I am for example using a design formula like this: "~ 1 + Condition*Time + Replicates". Then, as a result, I get the info which proteins are changed significantly over time in the exponential and the stationary condition (both separately).
However, I noticed that the variance of the residuals is for ~20% of the proteins not the same between both conditions, so either the exponential of stationary condition has a higher variance between the residuals of the measurement points of that condition. In other words, ~20% of the proteins have the problem of heteroscedasticity between the conditons. I found this out with Levenes test.
My question what I should do now? Which appropriate strategy does limma offer that can deal with this problem? Or should I just ignore this?
I would appreciate any help!
Best, Thomas
Thank you very much for your response, Mr. Smyth. The info you provide is very helpful.
I have only one followup question:
I tried using the weights from the arrayWeigths() function, like this:
but this does not change the results (with the default method = "ls"). I inspected the weights, and some samples only have a weight of like 0.25, so I guess it should do something. I also set all weights but one to the value of 1, and the one weight to 9999, to see if something happens, but it also did not change the results. Only when I select method = "robust", then the weights change the results (but I would like to use the method = "ls".
There is absolutely no doubt that setting weights will change the results from lmFit(). This is long-standing functionality in the limma package. There are examples of its use in the limma User's Guide.
Thank you!