Deseq2 - removing batch effect
Entering edit mode
vinisha • 0
Last seen 2.4 years ago
United States


I have read blogs on how to remove batch effects in deseq2 and how to visualize it using limma voom remove batch effect PCA. I'm not sure if the batch effect was removed in my analysis. Please find the attached output, I'm skeptical because in heat map they seem to cluster separately (with respect to batches).

Analysis: 51 samples with two batches (Batch1 includes NYGC in the label and Batch2 doesn't include NYGC in the label). The two batches are rnaseq data from different sequencers.

PCA before removing batch effect

enter image description here

PCA after removing batch effect


Heat map with batch factor included in design formula design =~ Gender+ageRange+Batch+Cond


Heat map without batch factor in design formula design =~ Gender+ageRange+Cond


I'd be really great if anyone could validate my results. Thank you!

Best, Vinisha

BatchEffect DESeq2 • 1.9k views
Entering edit mode
Last seen 4 hours ago

Hi Vinisha, please always show the code that you used, in this case, the code that you used to input your raw counts, normalise these, generate the first PCA bi-plot, batch correct, and then generate the second PCA bi-plot.

Regarding the heatmap, I would not expect any difference based on inclusion or exclusion of batch from the design, depending on how you are generating these heatmaps.


Entering edit mode

Hi Kevin,

I've edited my post as it had an incorrect image. Kindly take a look again. Attaching the code below,

total_new$Cond <- factor(total_new$Cond, levels = c("ALS", "HC"))
total_new$ageRange <- factor(total_new$ageRange, levels = c("TRUE", "FALSE"))
total_new$Gender <- factor(total_new$Gender, levels = c("M", "F"))
total_new$c9 <- factor(total_new$c9, levels = c("Yes","No"))
total_new$Batch <- factor(total_new$Batch,levels= c("Batch2","Batch1"))

dds<-DESeqDataSetFromMatrix(countData = data_new, colData=total_new, design =~ Gender+ageRange+Cond)
dds <- dds[ rowMeans(counts(dds)) > 20, ]
dds <- estimateSizeFactors(dds)
dds <- DESeq(dds)
res <- results(dds, contrast=c("Cond","ALS","HC"))
res.out <-

#using it only for PCA
vsd <- vst(dds, blind=FALSE)
pcaData <- plotPCA(vsd, intgroup= c("Batch"), returnData=TRUE)
percentVar <- round(100 * attr(pcaData, "percentVar"))
ggplot(pcaData, aes(PC1, PC2, color= Batch, label=name)) +
  geom_point(size=3) +
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) + 

res <- res[!$pvalue),]
resOrdered <- res[(res$pvalue < 0.05),]
ressig <- resOrdered[(resOrdered$log2FoldChange < -1.5) | (resOrdered$log2FoldChange > 1.5) ,]
res_sort <- ressig[order(-abs(ressig$log2FoldChange)),]
mat <- assay(dds_rlog[rownames(ressig),])
pheatmap(mat, cluster_rows = TRUE, cluster_cols= TRUE, scale="row", show_rownames = FALSE)

For batch effect removal I included batch in the design formula.

dds<-DESeqDataSetFromMatrix(countData = data_new, colData=total_new, design =~ Gender+ageRange+Batch+Cond)
assay(vsd)<- limma::removeBatchEffect(assay(vsd), vsd$Batch)
plotPCA(vsd, "Batch")
Entering edit mode

Ah! - now I see. Originally, it was as if literally nothing changed.

Although PC2 only accounts for 3% variation, it would be interesting to repeat everything after having remove the 'outlier' at the top of the original bi-plot. Also, is there anything else that could be confounding the differences between batch1 and batch2? Do you have statistical support for including Gender and ageRange in the design? If these are not contributing to any variation, then, by including them, you may inadvertently introduce bias that did not previously exist.

Entering edit mode

I did try removing the outliers,still no luck! So I ran the analysis separately(with respect to batches) and found Gender and ageRange to be a factor for clustering. That's the reason I included in the combined(Batch1 and Batch2) analysis. I also tried with only Batch in the design formula, they still clustered as per batches. Should I try a different way? Thank you for helping me out on this, Kevin!

Batch1 heatmap

enter image description here

Batch2 heatmap

enter image description here

Entering edit mode

If the batch effect is inconsistent across genes, for whatever reason, then removeBatchEffect() will struggle to adjust for this. You could instead try ComBat-Seq to remove the batch effect at the raw count stage (prior to DESeq2), if you wish; otherwise, try to explore other sources of variation via my package, PCAtools::eigencorplot(): (I have a section on how to use this with DESeq2).

If all else fails, I would proceed with the analysis, possibly stratifying by batch, i.e., doing 2 separate analyses and then corroborating results across both.


Login before adding your answer.

Traffic: 358 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6