Question: removing batch effect using DESeq and sva
gravatar for bioyas
5 months ago by
bioyas0 wrote:


I am working on RNA seq data with 39(13 samples with 3 replicates) samples. I would like to do a PCA analysis. Apparently, there is batch effect in data. I would like to remove these batch effects and see how my samples would cluster after removing batches. I am following the instruction from this paper.

here is my experiment design.

   Bio_rep  cond passage Clone  treatment            Names
1        2 extra    P148    B2 extra.P148 P148.extra.rep.2
2        3 extra    P148    B2 extra.P148 P148.extra.rep.3
3        2 intra      P7     B   intra.P7   P7.intra.rep.2
4        3 intra      P7     B   intra.P7   P7.intra.rep.3
5        2 extra      P7     B   extra.P7   P7.extra.rep.2
6        3 extra      P7     B   extra.P7   P7.extra.rep.3
7        1 intra      RH    RH   intra.RH   RH.intra.rep.1
8        2 intra      RH    RH   intra.RH   RH.intra.rep.2
9        3 intra      RH    RH   intra.RH   RH.intra.rep.3
10       1 extra      RH    RH   extra.RH   RH.extra.rep.1
11       2 extra      RH    RH   extra.RH   RH.extra.rep.2
12       3 extra      RH    RH   extra.RH   RH.extra.rep.3
13       2 intra     P11    B2  intra.P11  P11.intra.rep.2
14       3 intra     P11    B2  intra.P11  P11.intra.rep.3
15       1 extra     P85    B2  extra.P85  P85.extra.rep.1
16       1 extra    P148    B2 extra.P148 P148.extra.rep.1
17       1 intra      P7     B   intra.P7   P7.intra.rep.1
18       1 extra      P7     B   extra.P7   P7.extra.rep.1
19       1 intra    P148    B2 intra.P148 P148.intra.rep.1
20       1 intra     P85    B2  intra.P85  P85.intra.rep.1
21       1 intra     P11    B2  intra.P11  P11.intra.rep.1
22       1 extra     P11    B2  extra.P11  P11.extra.rep.1
23       2 extra     P11    B2  extra.P11  P11.extra.rep.2
24       3 extra     P11    B2  extra.P11  P11.extra.rep.3
25       2 intra     P85    B2  intra.P85  P85.intra.rep.2
26       3 intra     P85    B2  intra.P85  P85.intra.rep.3
27       2 extra     P85    B2  extra.P85  P85.extra.rep.2
28       3 extra     P85    B2  extra.P85  P85.extra.rep.3
29       2 intra    P148    B2 intra.P148 P148.intra.rep.2
30       3 intra    P148    B2 intra.P148 P148.intra.rep.3
31       1 extra     P35    B2  extra.P35  P35.extra.rep.1
32       2 extra     P35    B2  extra.P35  P35.extra.rep.2
33       3 extra     P35    B2  extra.P35  P35.extra.rep.3
34       1 extra     P55    B2  extra.P55  P55.extra.rep.1
35       2 extra     P55    B2  extra.P55  P55.extra.rep.2
36       3 extra     P55    B2  extra.P55  P55.extra.rep.3
37       1 extra    P210    B2 extra.P210 P210.extra.rep.1
38       2 extra    P210    B2 extra.P210 P210.extra.rep.2
39       3 extra    P210    B2 extra.P210 P210.extra.rep.3

I have created a DESeqDataSet and used rld() to stabilize the variation then plotted the heat maps and pca. the heat map clearly shows batch effects.

Then I used sva to remove those hidden effects and add the 3 surrogate variables to the design of DESeq object. and then re-plot the PCA and heatmap. But still I get the same plot. It seems like sva has not corrected the effects and unwanted variation.

# DESeq object 
dds <- DESeqDataSetFromMatrix(countData = countDataMatrix,
                              colData = coldata,
                              design = ~treatment)
dds <- estimateSizeFactors(dds)

# filtering the genes that have an average of greater than 10 across all samples
dds <- dds[rowSums(counts(dds)) > 10, ]

#transforming  dds object
rld <- rlog(dds, blind=FALSE)

sampleDists <- dist(t(assay(rld)))

sampleDistMatrix <- as.matrix(sampleDists)
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")))(255)
         main = "euclidean dist from DESeq bject before removing bad samples")

pcadata <- plotPCA(rld, intgroup = c("cond", "passage"), returnData=TRUE)
percentVar <- round(100 * attr(pcadata, "percentVar"))

ggplot(pcadata, aes(PC1, PC2, color= treatment, shape=cond)) + geom_point(size=3) +
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) + 
  labs(title="PCA before removing batch effect")

#removing batch effects
dat <- counts(dds, normalized = TRUE)
idx <- rowMeans(dat) > 1
dat <- dat[idx,]
mod <- model.matrix(~ treatment, colData(dds))
mod0 <- model.matrix(~1, colData(dds))
svseq <- svaseq(dat, mod = mod, mod0 = mod0, = 3)

stripchart(svseq$sv[,1] ~ dds$treatment,vertical=TRUE,main="SV1")
stripchart(svseq$sv[,2] ~ dds$treatment,vertical=TRUE,main="SV2")
stripchart(svseq$sv[,3] ~ dds$treatment,vertical=TRUE,main="SV3")

# DESeq object after removing batch
ddssva <- dds 
ddssva$SV1 <- svseq$sv[,1]
ddssva$SV2 <- svseq$sv[,2]
ddssva$SV3 <- svseq$sv[,3]
## add 3 surrogate variable to the dds design 
design(ddssva) <- ~ SV1 + SV2 + SV3 + treatment

# transformation of counts after sva
rldsva <- rlog(ddssva, blind=FALSE)
# sample dist after sva

sampleDists.sva <- dist( t( assay(rldsva) ) )
sampleDistMatrix.sva <- as.matrix( sampleDists.sva )
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)

DO you have any idea where an I doing wrong.

ADD COMMENTlink modified 5 months ago by Michael Love26k • written 5 months ago by bioyas0
Answer: removing batch effect using DESeq and sva
gravatar for Michael Love
5 months ago by
Michael Love26k
United States
Michael Love26k wrote:

Check the FAQ in the vignette, there is one particularly relevant to this question.

ADD COMMENTlink written 5 months ago by Michael Love26k

Thanks for your message. I am looking at F1000 research GateWays to follow the paper. Unfortunately, I was not able to find the FAQ section. Could you please direct me to where I need to the FAQ section so that I can find the answer to my question. I appreciate your help.

ADD REPLYlink written 5 months ago by bioyas0

You asked

Then I ... add the 3 surrogate variables to the design of DESeq object. and then re-plot the PCA and heatmap. But still I get the same plot

I’ll just link directly to the question itself:

ADD REPLYlink written 5 months ago by Michael Love26k

Thanks for your reply. I look at the link and I am able to apply it to my data. But still I see some batches even after removing batches with the removeBatchEffect from Lima package. I have another question regarding SVA. I tries to define batches manually by looking at SV1 and SV2 (surrogate variables) and define batches based on negative and positive values of SVs. I am not sure if this is a good approach to use SVA coordinates to define batch manually. Do you have any ideas how can I use SVA coordinates to define batch? Thanks

ADD REPLYlink written 5 months ago by bioyas0

That's more of an SVA question, so i'll hold back from answering

ADD REPLYlink written 4 months ago by Michael Love26k


you are definitely right that's more about sva. I might need to post it with different topic and tags.


ADD REPLYlink written 4 months ago by bioyas0

I may be missing something, but I don't see why you would want to dichotomize the SVs. You can use limma's removeBatchEffect to remove continuous covariates as well, just put them in the 'covariates' argument.

ADD REPLYlink written 4 months ago by Peter Langfelder2.3k


I have already used 2 SVs as a covariate matrix and passed it to removeBatchEffect() function. After doing so the samples clustering would be disrupted even worse than than before batch correction. I was wondering if this is due to the extra batch effect (still remained in data) or created bias by using SVA. I thought defining batches manually using the SVs could be an approach that removes batches without introducing bias to data. I was wondering if batch correction is working fine in my case.


ADD REPLYlink written 4 months ago by bioyas0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 220 users visited in the last hour