I'm trying to reproduce an old analysis I did using the bioc package limma. I've cloned the git repo and I do see limma versions that are part of bioc releases, but I don't see versions 3.43.4 or 3.43.5 there. I also don't see these versions in the cran archive. And google hasn't produced any other leads.
Does someone know how I can download these versions?
The limma versions that you mention were developmental versions rather than release versions of limma and you should not be using those for analyses unless you are a Bioconductor developer yourself. Anyway, there have been no changes to limma since those versions that would change the results of a standard DE analysis with a full-rank design matrix and estimable coefficients.
As well as the git log, you can see a human-friendly summary of each change for every version of limma from limma::changeLog(). If you only need to see a summary of the changes corresponding to each Bioconductor release, then:
I had looked at the changelog, but only now realize that the versions I asked for above were only in the development branch. I do not use development versions of packages, so apparently I did not use the versions I asked for above.
I analyzed a dataset with NAs using limma arrayWeights & contrasts.fit in May 2019 (although I'm not sure which package versions I used) and get similar results using the same code today (the t-statistics are off by approximately 0.05) but not identical results. It sounds like the difference should not be due to limma, right?
The most common cause of changes in limma results are changes in the annotation packages (Bioconductor or otherwise) that often cause slight changes in the number of genes included in an analysis. However there have been changes to the arrayWeights and contrasts.fit functions over time.
In particular, a small but pervasive improvement was introduced to arrayWeights with the Bioconductor 3.9 release in May 2019.
If your old analysis was run using Bioconductor 3.8 or earlier, then you could try adding prior.n=0 to the arrayWeights() call to see if it restores the old results.
Changes in Bioconductor 3.12 (limma 3.46.0 Oct 2020)
Bug fix to arrayWeights() when 'y' contains NAs, the design matrix has several columns and some but not all genes have no residual df.
Changes in Bioconductor 3.11 (limma 3.44.0 May 2020)
Improved treatment of NA coefficients by contrasts.fit(). contrasts.fit() has always removed coefficients that were NA because of singularities in the design matrix, but any extra NA coefficients caused by NA expression values would cause all the stdev.unscaled values returned by contrasts.fit() to be NA for those genes. contrasts.fit() now returns non-NA coefficients and non-NA stdev.unscaled values if all the NA coefficients are multiplied by zero contrast multipliers.
Changes in Bioconductor 3.9 (limma 3.40.0 May 2019)
Major rewrite of arrayWeights() to improve speed and stability. The arrayWeightsSimple() function has been removed and its functionality incorporated into arrayWeights(). A new 'var.group' argument has been added to simplify specification of the variance design matrix. The weights are now squeezed slightly towards unit and a new argument 'prior.n' has been added to control the prior weight with which the array weights are squeezed. arrayWeights() now chooses between the REML and gene-by-gene algorithms automatically by default. REML is chosen when there are no prior weights or missing values and gene-by-gene is used otherwise. arrayWeights() now checks for and skips over any genes with zero residual variances.
Having said all that, I do analyses for collaborators all the time but I just wouldn't consider trying to reproduce an analysis exactly from 3 years ago. It is a great deal of work because you have to reproduce precisely both R and every single package that you had installed at time. If you don't have a record of what package versions you had at the time, then it's an impossible task. Meanwhile, the new results are only slightly different and almost certainly better.
Setting aside the need for an exact version of limma, it's simple enough to get whichever one you want.
## After having done
git clone https://git.bioconductor.org/packages/limma
## and going into the limma directory
$ git log | grep -n 3.43.4
539: 16 Feb 2020: limma 3.43.4
7860:Date: Wed May 27 23:43:46 2009 +0000
## so we want to be around line 539
$ git log | sed -n '530,540p'
the presence of non-estimable coefficients sometimes caused the
cov.coefficient matrix to be subsetted incorrectly; that is now
fixed. Subsetting zero columns is now allowed even when F-statistics
are present.
commit fb2ce173d94fee80707ba421876c85530f2ad112 <------------------------ the important thing
Author: Gordon Smyth <smyth@wehi.edu.au>
Date: Sun Feb 16 21:16:40 2020 +1100
16 Feb 2020: limma 3.43.4
$ git checkout fb2ce173d94fee80707ba421876c85530f2ad112
Note: switching to 'fb2ce173d94fee80707ba421876c85530f2ad112'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at fb2ce17 16 Feb 2020: limma 3.43.4
And now you can install and do your analysis.
However, there have been 54 commits to limma in the intervening period, and presumably they weren't made for arbitrary reasons. One could argue that using the current release version of limma is materially better than using an old version, even if some of the results change. OTOH, I work in a core, and have people ask me to re-run old analyses to add new contrasts or whatever, and they tend to be less than happy if I do so and some of their existing gene lists change, so I get not wanting things to change...
I had looked at the changelog, but only now realize that the versions I asked for above were only in the development branch. I do not use development versions of packages, so apparently I did not use the versions I asked for above.
I analyzed a dataset with NAs using limma arrayWeights & contrasts.fit in May 2019 (although I'm not sure which package versions I used) and get similar results using the same code today (the t-statistics are off by approximately 0.05) but not identical results. It sounds like the difference should not be due to limma, right?
Thanks for your helpful & quick reply.
The most common cause of changes in limma results are changes in the annotation packages (Bioconductor or otherwise) that often cause slight changes in the number of genes included in an analysis. However there have been changes to the arrayWeights and contrasts.fit functions over time. In particular, a small but pervasive improvement was introduced to arrayWeights with the Bioconductor 3.9 release in May 2019. If your old analysis was run using Bioconductor 3.8 or earlier, then you could try adding
prior.n=0
to thearrayWeights()
call to see if it restores the old results.Changes in Bioconductor 3.12 (limma 3.46.0 Oct 2020)
Changes in Bioconductor 3.11 (limma 3.44.0 May 2020)
Changes in Bioconductor 3.9 (limma 3.40.0 May 2019)
Having said all that, I do analyses for collaborators all the time but I just wouldn't consider trying to reproduce an analysis exactly from 3 years ago. It is a great deal of work because you have to reproduce precisely both R and every single package that you had installed at time. If you don't have a record of what package versions you had at the time, then it's an impossible task. Meanwhile, the new results are only slightly different and almost certainly better.