Dear all,
please could you advise me on the following analysis, whether it is correct or optimal:
briefly, speaking, I have 3 pairs of RNA-seq samples (WT vs KO), that have been sequenced at different time points.
More precisely, let's name the samples : WT1; WT2 ; WT3; KO1 ; KO2 ; KO3,
where the pairs are (WT1, KO1), (WT2, KO2), (WT3, KO3) that have been sequenced at different times by different technologies (there is a batch effect, as shown also by PCA or MDS display).
When i do the differential analysis of gene expression, I take into account the pairs WT-KO, that in this case coincide with the batches. The question would be : is the R code below correct in using paired design and doing the batch correction, or shall you have any suggestions, please let me know ;)
library("edgeR")
eset <- read.delim("RNAseq",row.names="Geneid")
group <- factor(c("wt","wt","wt","ko","ko","ko"))
group <- relevel(group,ref="wt")
subject <- as.factor(c(1,2,3,1,2,3))
batch <- as.factor(c(1,2,3,1,2,3))
design <- model.matrix(~group+subject)
y <- DGEList(counts=eset, group=group)
keep <- rowSums(cpm(y) > 0.5) >= 6
y <- y[keep,,keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
### MDS before batch correction
logCPM <- cpm(y, log=TRUE, prior.count=5)
plotMDS(logCPM, col=as.numeric(batch))
### MDS after batch correction
logCPM <- removeBatchEffect(logCPM, batch=batch)
plotMDS(logCPM, col=as.numeric(batch))
#############
fit <- lmFit(logCPM, design)
fit <- eBayes(fit,trend=TRUE, robust=TRUE)
pdf("after.batch.correction.plotSA")
plotSA(fit)
dev.off()
results <- topTable(fit, coef=2, adjust="fdr", number=Inf)
write.table(results, file="results.edgeR.txt",
sep="\t", eol="\n", row.names=TRUE, col.names=TRUE)

Dear Gordon, thank you very much for a quick and helpful reply ;) A question though please, although I see of course that the paired analysis accounts for the batch effects
would it be fine to keep these lines of code below (before lmFit (
logCPMc, design)):logCPMc <- removeBatchEffect(logCPM, batch=batch)plotMDS(logCPM, col=as.numeric(batch))
as the results seem to look better on the MDS plot after
removeBatchEffect. thank you !You can use removeBatchEffect() for the MDS plot. That's the main purpose of the function, for plotting purposes.
Dear Gordon, thanks a lot for your help ;) Have a good and fruitful week ;)
In this case, subject and batch are identical:
subject <- as.factor(c(1,2,3,1,2,3))
batch <- as.factor(c(1,2,3,1,2,3))
But how about the following example?
subject <- as.factor(c(1,2,3,1,2,3))
batch <- as.factor(c(1,1,1,2,2,1))
What is the best model design to correct batch effect and to predict DEG between WT and KO groups?
Please start a new post for a new question.