Dear all,
please could you advise me on the following analysis, whether it is correct or optimal:
briefly, speaking, I have 3 pairs of RNA-seq samples (WT vs KO), that have been sequenced at different time points.
More precisely, let's name the samples : WT1; WT2 ; WT3; KO1 ; KO2 ; KO3,
where the pairs are (WT1, KO1), (WT2, KO2), (WT3, KO3) that have been sequenced at different times by different technologies (there is a batch effect, as shown also by PCA or MDS display).
When i do the differential analysis of gene expression, I take into account the pairs WT-KO, that in this case coincide with the batches. The question would be : is the R code below correct in using paired design and doing the batch correction, or shall you have any suggestions, please let me know ;)
library("edgeR")
eset <- read.delim("RNAseq",row.names="Geneid")
group <- factor(c("wt","wt","wt","ko","ko","ko"))
group <- relevel(group,ref="wt")
subject <- as.factor(c(1,2,3,1,2,3))
batch <- as.factor(c(1,2,3,1,2,3))
design <- model.matrix(~group+subject)
y <- DGEList(counts=eset, group=group)
keep <- rowSums(cpm(y) > 0.5) >= 6
y <- y[keep,,keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
### MDS before batch correction
logCPM <- cpm(y, log=TRUE, prior.count=5)
plotMDS(logCPM, col=as.numeric(batch))
### MDS after batch correction
logCPM <- removeBatchEffect(logCPM, batch=batch)
plotMDS(logCPM, col=as.numeric(batch))
#############
fit <- lmFit(logCPM, design)
fit <- eBayes(fit,trend=TRUE, robust=TRUE)
pdf("after.batch.correction.plotSA")
plotSA(fit)
dev.off()
results <- topTable(fit, coef=2, adjust="fdr", number=Inf)
write.table(results, file="results.edgeR.txt",
sep="\t", eol="\n", row.names=TRUE, col.names=TRUE)
Dear Gordon, thank you very much for a quick and helpful reply ;) A question though please, although I see of course that the paired analysis accounts for the batch effects
would it be fine to keep these lines of code below (before lmFit (
logCPMc, design))
:logCPMc <- removeBatchEffect(logCPM, batch=batch)
plotMDS(logCPM, col=as.numeric(batch))
as the results seem to look better on the MDS plot after
removeBatchEffect
. thank you !You can use removeBatchEffect() for the MDS plot. That's the main purpose of the function, for plotting purposes.
Dear Gordon, thanks a lot for your help ;) Have a good and fruitful week ;)
In this case, subject and batch are identical:
subject <- as.factor(c(1,2,3,1,2,3))
batch <- as.factor(c(1,2,3,1,2,3))
But how about the following example?
subject <- as.factor(c(1,2,3,1,2,3))
batch <- as.factor(c(1,1,1,2,2,1))
What is the best model design to correct batch effect and to predict DEG between WT and KO groups?
Please start a new post for a new question.