Hello,
Me and a colleague are performing differential expression analysis on a dataset comprised of 2 groups, controls and cases. We wanted to test/correct for batch effect so we decided to use SVA and we were obtaining the initial files from a shared folder.
We were however getting different results. We thought it might be due to small differences in our scripts, however, when running the exact same script the difference persisted both in the post-SVA correction as well as in the DESeq2 results (previous to SVA)
As a comparison, my results:
With DESeq2 package version: 1.16.1 and R version 3.4.0 (2017-04-21):
> summary(res)
out of 47541 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 41, 0.086%
LFC < 0 (down) : 9, 0.019%
outliers [1] : 168, 0.35%
low counts [2] : 29741, 63%
(mean count < 11)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
And post SVA:
> summary(ressva)
out of 47541 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 6, 0.013%
LFC < 0 (down) : 4, 0.0084%
outliers [1] : 0, 0%
low counts [2] : 5378, 11%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
And her results
With DESeq2 package version: 1.10.1 and R version 3.2.4 (2016-03-10):
> summary(res)
out of 47541 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 26, 0.055%
LFC < 0 (down) : 3, 0.0063%
outliers [1] : 287, 0.6%
low counts [2] : 30152, 63%
(mean count < 12.5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
And post SVA:
> summary(ressva)
out of 47541 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 34, 0.072%
LFC < 0 (down) : 5, 0.011%
outliers [1] : 0, 0%
low counts [2] : 30270, 64%
(mean count < 12.5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
If we do a dds command we also see some differences:
Mine:
> dds
class: DESeqDataSet
dim: 65988 12
metadata(1): version
assays(3): counts mu cooks
rownames(65988): ENSG00000000003 ENSG00000000005 ... ENSG00000282821 ENSG00000282822
rowData names(21): baseMean baseVar ... deviance maxCooks
colnames(12): c1 c2 ... m5 m6
colData names(2): condition sizeFactor
Hers:
> dds
class: DESeqDataSet
dim: 65988 12
exptData(0):
assays(3): counts mu cooks
rownames(65988): ENSG00000000003 ENSG00000000005 ... ENSG00000282821 ENSG00000282822
rowRanges metadata column names(27): baseMean baseVar ... deviance maxCooks
colnames(12): c1 c2 ... m5 m6
colData names(2): condition sizeFactor
After doing some troubleshooting to make sure that the files were synched and everything was working properly we decided to see if it was due to the version difference and as soon as she upgraded her DESeq2 package version and R we got the same results.
Another small thing, we checked the vectors that we get for SVA correction and they are the same independent of the difference in the DESeq2 results (which I find curious, since the different metrics (log2fc, post normalisation reads, p-adjusted) seem to differ.
So, I think that my question for you is if this big of a difference is "normal" regarding this differences and maybe some insight on why we would have them.
Thank you for your time,
Cheers,
David M. F. Francisco
Thanks for the answer!
Yeah, that is why I said "normal".
What intrigues me is the differences in the post SVA correction since the vectors we obtain are the same but the results diverge quite a lot for all metrics.
Anyway,
Thanks again for the answer