I'm analysing RNA-seq data with Limma-Voom and after reading and apply different methods of analysis I have arrived at two different results and I'm not sure which one is correct. Here the specs: n=60 (~30 per tissue) The experiment (Multi-level time course experiment):
dataForAnalysis = data.frame(Sample=c("s_1", "s_1", "s_2", "s_2", "s_1", "s_1", "s_2", "s_2"), Tissue=c("A", "A","A", "A", "B", "B", "B", "B"), Condition=c("before", "after","before", "after","before", "after","before", "after"))
Sample | Tissue | Condition |
---|---|---|
s_1 | A | before |
s_1 | A | after |
s_2 | A | before |
s_2 | A | after |
s_1 | B | before |
s_1 | B | after |
s_2 | B | before |
s_2 | B | after |
Analysis No. 1
designMatrix = model.matrix(~ Tissue + Condition, dataForAnalysis)
nf = calcNormFactors(counts)
forDgeData.voom <- voom(counts, designMatrix, plot=False, lib.size=colSums(counts)*nf)
forDgeData.dups = duplicateCorrelation(forDgeData.voom, designMatrix, block= dataForAnalysis$Sample)
forDgeData.fitRobust = lmFit(forDgeData.voom, forDgeData.voom$design, correlation= forDgeData.dups$consensus, block= dataForAnalysis$Sample, method='robust')
forDgeData.ebayes = eBayes(forDgeData.fitRobust, robust=True)
Using this method, these are my results:
summary(decideTests(forDgeData.ebayes))
(Intercept) TissueA ConditionAfter
-1 1615 6594 22
0 446 1573 15631
1 13631 7525 39
From these results, I'm taking the ones in the "ConditionAfter" to get the DE genes between the conditions using:
toptable(forDgeData.ebayes, coef="ConditionAfter", ...)
Analysis No. 2 (For this analysis I pasted tissue and condition for the design matrix and use contrasts)
predesignMatrix = factor(paste(dataForAnalysis$Tissue, dataForAnalysis$Condition, sep='.'))
designMatrix = model.matrix(~ 0 + predesignMatrix)
nf = calcNormFactors(counts)
forDgeData.voom <- voom(counts, designMatrix, plot=False, lib.size=colSums(counts)*nf)
forDgeData.dups = duplicateCorrelation(forDgeData.voom, designMatrix, block= dataForAnalysis$Sample)
forDgeData.fitRobust = lmFit(forDgeData.voom, forDgeData.voom$design, correlation= forDgeData.dups$consensus, block= dataForAnalysis$Sample, method='robust')
forDgeData.ebayes = eBayes(forDgeData.fitRobust, robust=True)
designMatrix.forContrast = makeContrasts(diff_Before_vs_After = (A.after-A.before)-(B.after-B.before), levels= designMatrix)
forDgeData.fitRobust.fitRobust2 = contrasts.fit(forDgeData.fitRobust, designMatrix.forContrast)
forDgeData.ebayes = eBayes(forDgeData.fitRobust.fitRobust2)
Using this method, these are my results:
summary(decideTests(forDgeData.ebayes))
diff_Before_vs_After
-1 7
0 15680
1 5
So, comparing results from Analysis No. 1 and No. 2, only 3 genes overlap. My main question here is if there are DE genes between the conditions (before and after), taking into account the tissue factor and also that the samples are duplicated.
Any help or advice... more than welcome!
Thanks
Aaron answered your main question already, but a minor comment: note that Gordon suggests to use
duplicateCorrelation
withvoom
by performing these calls twice each using two passes over the data:A: using duplicateCorrelation with limma+voom for RNA-seq data
Also, you should split the limma and
voom
tags, otherwise the maintainers don't get notified.