Question

Reproducibility using limma for DE Analysis

0

Entering edit mode

TSauer • 0

@73304c4d

Last seen 2.1 years ago

Germany

Hey,

I have a question regarding the reproducibility of limma. I am using it for differential expression analysis for proteomics experiments. I also use the decideTests function with the option 'global' in order to correct for multiple testing over >1 contrasts. I am having some problems regarding the reproducibility of the DE calculations: the results I am seeing in R Studio during calculating and preparing the code in markdown differs from the results I am seeing in my finalised knitted html file. In some cases, the differences are quite substantial, e.g. some proteins are deemed significant in R studio, but not in the html file and vice versa. I am aware that R uses a independent session for knitting the markdown document, but in the past (with other packages), I was able to make the results reproducible by setting a seed with set.seed. I tried this with limma as well, but it did not resolve my issue. Am I missing something? For example, in the clusterProfiler::gseGO() function, one has to give the argument seed = TRUE in order to make the function seed sensitive and the results reproducible. Is there an equivalent opportunity for limma that I did not find? Tried to google the problem, but some how, couldn't find any leads on that.

Thank you in advance!

Cheers Thorben

limma seed DifferentialRegulation • 1.5k views

ADD COMMENT • link updated 2.2 years ago by Gordon Smyth 52k • written 2.2 years ago by TSauer • 0

0

Entering edit mode

TSauer • 0

@73304c4d

Last seen 2.1 years ago

Germany

Yes, i know that. However, I implemented the imputation step quite long ago, using a different package, exported the data and then did the DE externally, which is why it never was a real problem. I just recently implemented the DE step via limma directly into the R workflow, which caused the issue to arise. But, it seems that I was already able to fix it.

Thanks again! Thorben

ADD COMMENT • link 2.2 years ago TSauer • 0

score 2 · Accepted Answer · 2023-01-30

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 1 day ago

WEHI, Melbourne, Australia

limma DE analyses have no random component are completely reproducible. limma generally doesn't have any dependence on the random number seed. The gene set testing functions roast() and romer() are the only exceptions, but they don't have any effect on the main DE analysis.

If there is variation in your analysis runs then the random component would have to be part of an earlier analysis step. For example, are you doing imputation before running limma? Imputation algorithms often have a random component.

ADD COMMENT • link 2.2 years ago Gordon Smyth 52k

0

Entering edit mode

Hey Gordon, thank you for the swift response. Indeed, I do perform imputation. Weirdly, it did not occur to me that this could be the origin of the problem, but it apparently is. Thank you for pointing me into the right direction, will try to fix it by making the imputation reproducible.

All the best Thorben

ADD REPLY • link 2.2 years ago TSauer • 0

0

Entering edit mode

limma can also be run without imputation, i.e., leaving the missing values as NA.

ADD REPLY • link 2.2 years ago Gordon Smyth 52k