error in Limma analysis - estimated df (df.0) for prior value of variance (s2.post) is Inf
2
0
Entering edit mode
cemcdono • 0
@cemcdono-12202
Last seen 4.7 years ago

I am trying to use Limma to analyze differential expression in processed, log2 transformed proteomics data. I have been following this code: http://www.biostat.jhsph.edu/~kkammers/software/CVproteomics/R_guide.html

However, for some of the analysis the p-values look very strange and the volcano plots show no spread, they are all in a line.

Looking deeper into the analysis outputs it seems as it is apparent something is going wrong with eBayes step. s2.post (posterior value of variance) is the exact same as s2.0 (estimated prior value of variance) and does not differ between gene. and the df.0 (degrees of freedom for s2.0 ) is inf

Any insight into what might be going wrong or how I can trouble shoot would be appreciated!

limma ebayes • 1.1k views
1
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 16 hours ago
The city by the bay

Firstly, it's good practice to post the minimal code required to generate a problem, rather than expecting people to read other resources. Secondly, an infinite prior d.f. from limma is not an error. It is entirely possible to obtain this value if all genes have the same true variance, such that all sample variances will be shrunk completely to the common variance estimate. I don't have experience with proteomics data, but an infinite prior d.f. is not unusual in ChIP-seq analyses with a related framework (i.e., edgeR). The thing you have to watch out for is whether this is due to the presence of a low-quality sample. One aberrant sample may inflate the variability, causing all genes to have large variances of similar magnitude. This will lead to the prior d.f. being estimated at an infinite value. So, inspect your samples with a MDS plot to see if any of them are doing something funny, and maybe compute array weights with arrayWeights to reduce the influence of problematic samples.

0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

Just to add to Aaron's answer, this is not an error from limma. If all the variances in your experiment appear to be the similar within sampling variability, then limma will treat them as the same and will pool all the protein-wise variances to get a combined variance estimate. This is why you have found all the protein-wise estimates to be equal. This is not a mistake -- is it how the algorithm is supposed to work.

However expression data seldom shows equal variances and this occurrence is very likely to be a symptom of a problem with your data or with the earlier steps in your analysis. For example, it may be there there is a substantial batch effect in your data that makes all the variances consistently large. In that case, limma may set s20 large and df.prior=Inf if the variances appear to too consistent to be chisquare distributed.

To trouble shoot this you should:

1. Plot the samples with plotMDS() and look for a batch effect.
2. Run eBayes() with trend=TRUE and robust=TRUE.
3. Make a plot of the protein-wise variances with plotSA(fit) after the eBayes() step.

If there is a batch effect, you should add extra terms to the linear model to account for the extra effects. If there are one or two outliers, you could use arrayWeights() as Aaron suggests to downweight them.

You should really be doing these things for any dataset. Just running DE code without doing some plots to check the data isn't a good idea.