correction for batch effects and paired-design of RNA-seq samples
1
1
Entering edit mode
Bogdan ▴ 670
@bogdan-2367
Last seen 14 months ago
Palo Alto, CA, USA

Dear all, 

please could you advise me on the following analysis, whether it is correct or optimal:

briefly, speaking, I have 3 pairs of RNA-seq samples (WT vs KO), that have been sequenced at different time points.

More precisely, let's name the samples : WT1; WT2 ; WT3;  KO1 ; KO2 ; KO3,

where the pairs are (WT1, KO1), (WT2, KO2), (WT3, KO3) that have been sequenced at different times by different technologies (there is a batch effect, as shown also by PCA or MDS display).

When i do the differential analysis of gene expression, I take into account the pairs WT-KO, that in this case coincide with the batches. The question would be : is the R code below correct in using paired design and doing the batch correction, or shall you have any suggestions, please let me know ;)

library("edgeR")
eset <- read.delim("RNAseq",row.names="Geneid")

group <- factor(c("wt","wt","wt","ko","ko","ko"))
group <- relevel(group,ref="wt")

subject <- as.factor(c(1,2,3,1,2,3))
batch <- as.factor(c(1,2,3,1,2,3))

design <- model.matrix(~group+subject)

y <- DGEList(counts=eset, group=group)
keep <- rowSums(cpm(y) > 0.5) >= 6
y <- y[keep,,keep.lib.sizes=FALSE]

y <- calcNormFactors(y)

### MDS before batch correction

logCPM <- cpm(y, log=TRUE, prior.count=5)
plotMDS(logCPM, col=as.numeric(batch))

### MDS after batch correction

logCPM <- removeBatchEffect(logCPM, batch=batch)
plotMDS(logCPM, col=as.numeric(batch))

#############

fit <- lmFit(logCPM, design)
fit <- eBayes(fit,trend=TRUE, robust=TRUE)

pdf("after.batch.correction.plotSA")
plotSA(fit)
dev.off()

results <- topTable(fit, coef=2, adjust="fdr", number=Inf)
write.table(results, file="results.edgeR.txt",
                 sep="\t", eol="\n", row.names=TRUE, col.names=TRUE)

 

 

limma edgeR • 2.5k views
ADD COMMENT
3
Entering edit mode
@gordon-smyth
Last seen 3 hours ago
WEHI, Melbourne, Australia

There is no need to do any batch correction. The baseline differences between subjects, including any batch effect, is already accounted for in the paired analysis.

ADD COMMENT
0
Entering edit mode

Dear Gordon, thank you very much for a quick and helpful reply ;) A question though please, although I see of course that the paired analysis accounts for the batch effects

would it be fine to keep these lines of code below (before lmFit (logCPMc, design)):

logCPMc <- removeBatchEffect(logCPM, batch=batch)
plotMDS(logCPM, col=as.numeric(batch))

as the results seem to look better on the MDS plot after removeBatchEffect. thank you !

 

ADD REPLY
2
Entering edit mode

You can use removeBatchEffect() for the MDS plot. That's the main purpose of the function, for plotting purposes.

ADD REPLY
0
Entering edit mode

Dear Gordon, thanks a lot for your help ;) Have a good and fruitful week ;)

ADD REPLY
0
Entering edit mode

In this case, subject and batch are identical:

subject <- as.factor(c(1,2,3,1,2,3))
batch <- as.factor(c(1,2,3,1,2,3))

But how about the following example? 

subject <- as.factor(c(1,2,3,1,2,3))
batch <- as.factor(c(1,1,1,2,2,1))

What is the best model design to correct batch effect and to predict DEG between WT and KO groups?

 

ADD REPLY
0
Entering edit mode

Please start a new post for a new question.

ADD REPLY

Login before adding your answer.

Traffic: 739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6