Remove batch effect - condition partially confounded by data
2
0
Entering edit mode
AdamJ • 0
@adamj-7282
Last seen 9.2 years ago
United Kingdom

Hello everyone,

I am running differential expression analysis between patients and healthy control

In the expression dataset the biological variable of interest and batch are partially confounded. Even though batch effect is not particularly strong on Principal components 1 versus 2, I would prefer to run some batch adjustment. However, when I run ComBat WITHOUT including my disease indicator in the model (according to the updated sva vignette)  the number of differentially expressed genes is 5 vs 28 without running combat.

I am concerned that batch correction have removed actual biological differences. What approach should I use to feel
more comfortable about a possible batch effect inflating my results?

Thank you

Adam Jaros

To illustrate I added 1) batches in my actual dataset 2) what I want to do - illustrated on bladderbatch data

1)batches in my actuall dataset

batch Control Cond1 Cond2
batch 1 8 0 0
batch 2 4 0 0
batch 3 6 0 0
batch 4 6 0 0
batch 5 0 7 9
batch 6 1 9 10
batch 7 0 7 6
batch 8 5 0 0
batch 9 7 0 0
batch 10 5 1 2
batch 11 7 8 10
batch 12 17 0 0
batch 13 8 0 0

 2) To show the difference

library(sva)
library(oligo)
library(bladderbatch)
library(limma)

# load data
data(bladderdata)
p<-pData(bladderEset)


# create subset similar to what I have
use<-sampleNames(bladderEset)[p$outcome=='Normal' | p$outcome=='sTCC-CIS']
subEset<-bladderEset[,use]
p<-pData(subEset)
xtabs(~batch+cancer,p)
# create model for combat and run it
mod<-model.matrix(~1,data=p)
batch<-p$batch
combat<-subEset # duplicate subEset
exprs(combat)<-ComBat(dat=exprs(subEset),batch=batch,mod=mod)

groups<-p$outcome=='sTCC-CIS' #define groups for comparison
par(mfrow=c(1,2)) # graphics

# create a volcano plot for each of them
for (eset in list(combat,subEset)){
        p<-pData(eset)
        e<-exprs(eset)
        d<-rowMeans(e[,groups],na.rm=T)-rowMeans(e[,!groups],na.rm=T)
        
        # do row ttests
        tt<-rowttests(e,factor(groups))
        lod<--log10(tt[['p.value']])
        
        ## volcano plots
        smoothScatter(d,lod,xlim=c(-3,3),ylim=c(0,10))  
        abline(h=2,v=c(-1,1))
        
        
      
}

3) bladderbatch subset

     cancer
batch Biopsy Cancer Normal
    2      0     13      4
    3      0      0      4
    5      0      3      0
combat differential gene expression sva • 2.6k views
ADD COMMENT
0
Entering edit mode
Jeff Leek ▴ 650
@jeff-leek-5015
Last seen 3.1 years ago
United States

Hi Adam,

Sorry for the delay in response, I've been traveling. One thing to keep in mind is that when batch and the variable you care about are correlated, adjusting for batch effects will very frequently reduce the number of genes called differentially expressed. The danger of skipping batch adjustment is that some of the genes called differentially expressed may only be differentially expressed due to batch effects. 

It is not surprising to see the # of significant genes go from 28 to 5 when adjusting for batch effects. One thing you can do is plot the expression levels for each of the 28 genes colored by batch and colored by the variable you found DE and try to determine if it is possible to distinguish batch effects from effects you care about - but this is usually really dicey and rarely more informative than the Combat analysis in my experience. 

 

Jeff

ADD COMMENT
0
Entering edit mode
Jeff Leek ▴ 650
@jeff-leek-5015
Last seen 3.1 years ago
United States

Hi Adam,

Sorry for the delay in response, I've been traveling. One thing to keep in mind is that when batch and the variable you care about are correlated, adjusting for batch effects will very frequently reduce the number of genes called differentially expressed. The danger of skipping batch adjustment is that some of the genes called differentially expressed may only be differentially expressed due to batch effects. 

It is not surprising to see the # of significant genes go from 28 to 5 when adjusting for batch effects. One thing you can do is plot the expression levels for each of the 28 genes colored by batch and colored by the variable you found DE and try to determine if it is possible to distinguish batch effects from effects you care about - but this is usually really dicey and rarely more informative than the Combat analysis in my experience. 

 

Jeff

ADD COMMENT

Login before adding your answer.

Traffic: 465 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6