Hi,
I am trying to perform differential gene expression analysis with edgeR. I have four conditions T1, T2, T3 and T4 and two batches. I am running the following codes:
Count1=read.table("GENESFORANALYSIS.txt", sep="\t", header=TRUE)
colnames(Count1) = paste(c("Genename", "T1R1","T1R2","T1R3",
"T2R1","T2R2","T2R3",
"T3R1","T3R2","T3R3",
"T4R1","T4R2","T4R3"))
conditions = c(rep("T1", 3), rep("T2", 3), rep("T3", 3), rep("T4",3))
batch=c("A", "B", "A",
"A", "A", "B",
"A", "A", "B",
"A", "A", "B")
y=DGEList(counts=Count1[,-1], genes=Count1[,1],group=conditions)
cpm=10/(min(y$samples$lib.size)/10^6)
keep = rowSums(cpm(y) >= 1.105529) >= 3
y = y[keep,,keep.lib.sizes=FALSE]
design=model.matrix(~ batch + conditions, data=y$samples)
rownames(design) = colnames(y)
z = calcNormFactors(y)
z = estimateGLMCommonDisp(z, design, verbose=TRUE)
z = estimateGLMTrendedDisp(z, design)
z = estimateGLMTagwiseDisp(z, design)
fit1 = glmFit(z, design)
lrt1 = glmLRT(fit1, coef=3)
tt=topTags(lrt1, n=nrow(z))
head(tt$table)
summary(decideTests(lrt1))
My model matrix is: (1)
design=model.matrix(~ batch + conditions, data=y$samples)
(Intercept) batchB conditionsT2 conditionsT3 conditionsT4
T1R1 1 0 0 0 0
T1R2 1 1 0 0 0
T1R3 1 0 0 0 0
T2R1 1 0 1 0 0
T2R2 1 0 1 0 0
T2R3 1 1 1 0 0
T3R1 1 0 0 1 0
T3R2 1 0 0 1 0
T3R3 1 1 0 1 0
T4R1 1 0 0 0 1
T4R2 1 0 0 0 1
T4R3 1 1 0 0 1
attr(,"assign")
[1] 0 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$batch
[1] "contr.treatment"
attr(,"contrasts")$conditions
[1] "contr.treatment"
(2)
design=model.matrix(~ 0+ batch + conditions, data=y$samples)
batchA batchB conditionsT2 conditionsT3 conditionsT4
T1R1 1 0 0 0 0
T1R2 0 1 0 0 0
T1R3 1 0 0 0 0
T2R1 1 0 1 0 0
T2R2 1 0 1 0 0
T2R3 0 1 1 0 0
T3R1 1 0 0 1 0
T3R2 1 0 0 1 0
T3R3 0 1 0 1 0
T4R1 1 0 0 0 1
T4R2 1 0 0 0 1
T4R3 0 1 0 0 1
attr(,"assign")
[1] 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$batch
[1] "contr.treatment"
attr(,"contrasts")$conditions
[1] "contr.treatment"
Now, what I need to know. 1) I need to know how to compare T1 vs T2, T1vs T3, T1vs T4, T2vsT3, T2vs T4, T3vs T4 using the commands (a) contrast= c(0, 0, 0, 0, 0) as well as coef=3/4/5/3:4/etc. 2) I need to know if my models (with or without intercepts) will include the batch effects while giving me the names of the differentially expressed genes?
I would like someone to kindly reply to me. I have checked the edgeR manual and not able to understand it. Thanks a lot. Arumoy.
It would be easiest to write the design as
0 + conditions + batch
so switching position of batch and condition. With this design you can make all your desired contrasts while properly accounting for the batch. The edgeR maintainers will probably add their expert opinions on that, but isn't the QLF framework the preferred pipeline these days rather than the LRT test?