Returning to edgeR version 3.26.1?
2
0
Entering edit mode
mvanhorn • 0
@834ccfdc
Last seen 29 days ago
United States

Hello,

I'm trying to return to a previous version of edgeR (3.26.1). I had been using edgeR 3.26.1 and R 3.6.0 within a bioinformatics pipeline implemented on a supercomputing system. After a change in the system, I had to reinstall the pipeline along with a new installation of edgeR and R, which defaulted to edgeR 3.32.1 and R 4.0.3. Processing the same set of data with the new edgeR/R installation gave me different DE results than when I had used edgeR 3.26.1. To keep consistency in my analysis, I want to return to the previous parameters when I had initially run my data (e.g., edgeR 3.26.1).

I've been able to install R 3.6.0 again, but the compatible Bioconductor version (3.9) only lets me go back to edgeR 3.26.8. Running my analysis with edgeR 3.26.8 also gives different DE results than my initial.

Is it possible to return to edgeR 3.26.1 and if so, how would I accomplish this? Also, were there significant changes made between the 3.26.1 and 3.26.8 versions of edgeR that could account for these differences in DE results?


> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: CentOS Linux 8 (Core)

Matrix products: default
BLAS/LAPACK: /jet/home/mvanhorn/.conda/envs/ciriq2-env/lib/R/lib/libRblas.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] edgeR_3.26.8 limma_3.40.6

loaded via a namespace (and not attached):
[1] compiler_3.6.0  Rcpp_1.0.7      grid_3.6.0      locfit_1.5-9.4
[5] lattice_0.20-44

> packageVersion("BiocManager")
[1] '1.30.16'


Any help or suggestions is greatly appreciated, thank you!

edgeR Bioconductor • 465 views
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

You are unlikely to be able to get to that particular version easily. The goal for Bioconductor is to have the release versions frozen at release, so they are consistent and available, and the only changes are meant to be bug fixes. That is not true for edgeR, however, as it is updated regularly during the release. The even middle numbers indicate release versions, so the first release for edgeR would have been 3.26.0, and given that the available version is 3.26.8, there were eight updates to the release version during that cycle. You could however use git to get what you want.

git clone https://git.bioconductor.org/packages/edgeR
cd edgeR
git checkout RELEASE_3_9
git pull
git log  -v

commit 836809e043535f2264e5db8b5c0eabcffe85613f
Author: Gordon Smyth <smyth@wehi.edu.au>
Date:   Sun Sep 1 10:40:08 2019 +1000

* edgeR 3.26.8
- In catchSalmon(), use num_valid_targets if num_targets not found in Salmon output.
- Remove calcNormFactors.Rd Note about changing default method.

commit 9a659689e08e4337f80217873a3252c0692f104f
Author: Gordon Smyth <smyth@wehi.edu.au>
Date:   Tue Aug 13 20:45:54 2019 +1000

edgeR 3.26.7

- Update User's Guide to fix design matrix on page 34.

<snip>

commit 4386d92c6b01d11c045327ae9f304d5793c1e495
Author: Gordon Smyth <smyth@wehi.edu.au>
Date:   Fri May 10 18:26:13 2019 +1000

edgeR 3.26.1
- Bug fix to goana.DGELRT and kegga.DGELRT when the LRT was on more than 1 df

## checkout the commit that corresponds to 3.26.1

$git checkout 4386d92c6b01d11c045327ae9f304d5793c1e495 Note: checking out '4386d92c6b01d11c045327ae9f304d5793c1e495'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b <new-branch-name> HEAD is now at 4386d92... edgeR 3.26.1 - Bug fix to goana.DGELRT and kegga.DGELRT when the LRT was on more than 1 df$ head DESCRIPTION
Package: edgeR
Version: 3.26.1
Date: 2019-05-10


And now you can either do

R CMD INSTALL edgeR


From a command line, or start R and do

install.packages("edgeR", repos = NULL)

0
Entering edit mode

Dear James,

We absolutely do not not make changes to edgeR results or syntax within a Bioconductor release. The only changes made to edgeR during a release are

• bug fixes
• documentation improvements
• (occasionally) addition of new options where this does not change existing code pipelines or defaults.
0
Entering edit mode

I'll just advertise that anyone who wishes can check the changes by visiting https://code.bioconductor.org/browse/edgeR/commits/RELEASE_3_9 Maybe this will be useful to anyone wishing to understand if the difference comes from edgeR versions or elsewhere.

0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

The only reason we make changes to packages is to improve them or to fix bugs, so I'm a bit dismayed that you're going back to a version of edgeR from more than 2 years ago, although I understand the inconvenience of having results change for a large project. Are there any major changes in the results? Do you absolutely need to to reproduce results exactly in all respects? It is often very difficult to do that because other packages change as well, not just edgeR. Even if you could reproduce your old results exactly using old software, I think it would be very problematic to publish biological results that no one can reproduce using more up-to-date software tools.

Within a particular Bioconductor release, we never make changes to the edgeR results except to fix bugs. So there are only two possibilities, either

• the results changes you are seeing are not actually due to edgeR or
• the edgeR 3.26.1 results you obtained were slightly incorrect.

In the Bioconductor 3.9 we had to make changes to limma and edgeR because of a change to the syntax of the approx and approxfun functions in the core stats package in R 3.6.0. It took us a few tries to make every single change that was necessary, the last code change was made in edgeR 3.26.3. We had to make these code changes in order to restore edgeR's behaviour to what it had been before -- it was not a change to the edgeR pipeline.

Without knowing which edgeR functions you used, I cannot tell whether you have been affected by the Bioc 3.9 changes or not. Even if you have been affected, I would have thought that differences in DE results would be small. But using edgeR 3.26.8 is the way to be sure that everything is correct.

0
Entering edit mode

Our project is a reanalysis of large sets of RNA-seq data to identify circular RNAs using the CIRIquant pipeline, which includes a DE function using edgeR. Specifically, the change in results that we saw was a difference in what circular RNAs were designated to be differentially expressed. In the original analysis, two circRNAs were selected as DE. In the new analysis, one completely different circRNA was selected as DE. One of the original DE circRNAs was still present in the output (chr1:16891302|16893846) but not considered to be DE by the pipeline, whereas the other one (chrMT:9533|9756) was not found in the output file at all.

Ideally, we would be able to at least replicate what circRNAs are being identified as DE, if not all of their related values (logFC, logCPM, etc.)

I've attached the results outputs for both below, the first being the original analysis and the second being the new analysis:

[mvanhorn@bridges2-login013 PRJNA283498_files]$head PRJNA283498_circRNA_de.tsv ,logFC,logCPM,LR,PValue,DE,FDR chrMT:9533|9756,5.88578993188712,-3.87181433589509,56.9071825832608,4.56874123061433e-14,1,1.64963539613792e-09 chr1:16891302|16893846,-5.63037505540129,-3.65307212295273,38.843220302071,4.59246166378462e-10,-1,8.29100066471356e-06 chr16:56678622|56717142,3.50172460582138,-4.83162436759169,18.9658885995227,1.33076431466765e-05,0,0.145503574937958 chr11:6423312|6524299,3.46502423555818,-4.84458395973169,18.6003430378681,1.61191541737566e-05,0,0.145503574937958 chrMT:8658|8846,3.13736456722681,-4.90739147530223,15.3876600805966,8.75582246709401e-05,0,0.538875327636596 chr9:111898033|111898249,-3.40844536466509,-4.57461054476394,13.2012387037111,0.000279764147458039,0,0.538875327636596 chr17:42990760|42991184,3.1350879662016,-4.72005289779565,12.6471072788946,0.000376147241735292,0,0.538875327636596 chrX:56585032|56758632,2.76187560221825,-4.96727452989565,12.156516377985,0.000489166566177739,0,0.538875327636596 chr1:201452922|201453109,2.91680250980879,-4.552497180473,12.1162436544346,0.000499844906148839,0,0.538875327636596 (ciriq2-env) [mvanhorn@bridges2-login011 mvanhorn]$ head PRJNA283498_circRNA_de_retry2.tsv
,logFC,logCPM,LR,PValue,DE,FDR
chrMT:10783|10923,4.3700430260254,-4.47662427712659,31.2577495187239,2.25943752218658e-08,1,0.000807229243551598
chrMT:5111|11727,3.63755442575343,-4.67619754855895,21.8428465973701,2.95917739666205e-06,0,0.0521867101349185
chr1:16891302|16893846,-4.03855442650442,-4.38324547732434,21.0899852639008,4.3821236153261e-06,0,0.0521867101349185
chr11:6423312|6524299,3.40290937694349,-4.74323512739316,19.1296051560946,1.22135604192294e-05,0,0.109088468274452
chrX:56585032|56758632,2.87568657574393,-4.83257202334906,13.7290982049716,0.000211157528144316,0,0.550245135413007
chr9:111898033|111898249,-3.32670884908044,-4.47902930411712,12.758920733265,0.000354315277443676,0,0.550245135413007
chr1:201452922|201453109,2.8542459168014,-4.43339898259566,12.1111729487293,0.000501205980086786,0,0.550245135413007
chr16:4815566|4829787,-3.01202915644561,-4.70215290962108,12.044102237918,0.000519565039878025,0,0.550245135413007
chr6:32489682|32549615,-2.74338197977742,-4.7689016695743,10.1378392335605,0.001452563335938,0,0.550245135413007


According to the CIRIquant developer, the output results are unfiltered.

As far as I can tell, the approx and approxfun functions were not used within the pipeline (https://github.com/bioinfo-biols/CIRIquant/blob/master/libs/CIRI_DE.R).

I also had to use a different version of limma than previously. The first version of limma that I used was 3.40.0, whereas the version I have currently is 3.40.6. The only other discrepancy between the first time I analyzed this data and the second is that a new version of CIRIquant was released, but I've already contacted the developer regarding differences between the two versions that could have caused this change in analysis. They replied that the main difference between versions is a commit added to prevent some sample ID related errors, and that they don't think any of the changes would have affected the final differential expression results.

I am still new to working with bioinformatics packages and differential expression, so there may be something within all this that I have overlooked that could be causing the difference.

0
Entering edit mode

The difference you see cannot be due to changes between edgeR between 3.26.1 and 3.26.8.

Your results how a big difference even in the logCPM column, which I think could only occur if the two analyses are of different datasets with different counts.

0
Entering edit mode

I have just confirmed there is no difference between edgeR 3.26.1 and edgeR 3.26.8 by running an analysis similar to yours first with edgeR 3.26.1 and then with edgeR 3.26.8. (I actually ran the case study from Section 4.1 of the User's Guide). I got completely identical results from the two edgeR versions (identical to all decimal digits).