Dealing with heteroscedascity in limma
2
0
Entering edit mode
Thomas • 0
@1a9ea05e
Last seen 4 days ago
Austria

Hello dear community,

There has been a similar post: "modeling heteroscedasticity in limma-voom for RNA-Seq data analysis" but the topic was different.

I have proteomics data over time, so every row of my matrix is a protein, and the columns are timepoints and replicates. Additionally, I have two conditions (groups) which are growth phases of the cells in the bioreactor (exponential and stationary). Column 1-18 consist of all the samples from the exponential condition, and column 19-36 consist of the samples of the stationary condition. Column 1 is timepoint 1 (T1) replicate 1 (R1), column 2 is T1-R2, column 3 is T1-R3, column 4 is T2-R1, and so on. So I have three replicates per timepoint. And this starts anew at column 19, at the condition stationary, so there I have the exact same sampling scheme.

I model the proteins over time with splines, and I use limma for that. However, do increase statistical power, and to also be able to test for interactions between the conditions, I model the two conditions together. For this, I am for example using a design formula like this: "~ 1 + Condition*Time + Replicates". Then, as a result, I get the info which proteins are changed significantly over time in the exponential and the stationary condition (both separately).

However, I noticed that the variance of the residuals is for ~20% of the proteins not the same between both conditions, so either the exponential of stationary condition has a higher variance between the residuals of the measurement points of that condition. In other words, ~20% of the proteins have the problem of heteroscedasticity between the conditons. I found this out with Levenes test.

My question what I should do now? Which appropriate strategy does limma offer that can deal with this problem? Or should I just ignore this?

I would appreciate any help!

Best, Thomas

limma splines Proteomics • 873 views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 16 minutes ago
WEHI, Melbourne, Australia

Let me say first that I don't have a lot of confidence in statistical tests for heteroscedasticity. Levene's test is one of the better tests but, in general, tests for variance differences are not robust to outliers or non-normality or multiple testing considerations. So you're actually making more risky assumptions when you test for heteroscedasticity than limma is doing in the original analysis. The limma analysis is fairly robust to heterscedasticity although there's usually a loss of power.

However, if you do think that there is a systematic difference in variability between the conditions, then limma handles this using the arrayWeights() function. You can either estimate a quality weight for each sample, which is the default and usually works well, or you can use the var.group argument to specifically model the variance as a function of the condition. We generally recommend the use of quality weights for large data sets anyway.

You might even consider using the limpa package, which provides an integrated solution for mass spec proteomics data in conjunction with the limma package: https://bioconductor.org/packages/limpa . limpa needs the peptide-level data however for full effect.

ADD COMMENT
0
Entering edit mode

Thank you very much for your response, Mr. Smyth. The info you provide is very helpful.

I have only one followup question:

I tried using the weights from the arrayWeigths() function, like this:

   weights <- limma::arrayWeights(
    object = data,
    design = design_matrix
  )
  fit <- limma::lmFit(
    object = data,
    design = design_matrix,
    weights = weights
  )

but this does not change the results (with the default method = "ls"). I inspected the weights, and some samples only have a weight of like 0.25, so I guess it should do something. I also set all weights but one to the value of 1, and the one weight to 9999, to see if something happens, but it also did not change the results. Only when I select method = "robust", then the weights change the results (but I would like to use the method = "ls".

ADD REPLY
1
Entering edit mode

There is absolutely no doubt that setting weights will change the results from lmFit(). This is long-standing functionality in the limma package. There are examples of its use in the limma User's Guide.

ADD REPLY
0
Entering edit mode

Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 389 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6