Hello everyone,
I am running differential expression analysis between patients and healthy control
In the expression dataset the biological variable of interest and batch are partially confounded. Even though batch effect is not particularly strong on Principal components 1 versus 2, I would prefer to run some batch adjustment. However, when I run ComBat WITHOUT including my disease indicator in the model (according to the updated sva vignette) the number of differentially expressed genes is 5 vs 28 without running combat.
I am concerned that batch correction have removed actual biological differences. What approach should I use to feel
more comfortable about a possible batch effect inflating my results?
Thank you
Adam Jaros
To illustrate I added 1) batches in my actual dataset 2) what I want to do - illustrated on bladderbatch data
1)batches in my actuall dataset
batch | Control | Cond1 | Cond2 |
batch 1 | 8 | 0 | 0 |
batch 2 | 4 | 0 | 0 |
batch 3 | 6 | 0 | 0 |
batch 4 | 6 | 0 | 0 |
batch 5 | 0 | 7 | 9 |
batch 6 | 1 | 9 | 10 |
batch 7 | 0 | 7 | 6 |
batch 8 | 5 | 0 | 0 |
batch 9 | 7 | 0 | 0 |
batch 10 | 5 | 1 | 2 |
batch 11 | 7 | 8 | 10 |
batch 12 | 17 | 0 | 0 |
batch 13 | 8 | 0 | 0 |
2) To show the difference
library(sva) library(oligo) library(bladderbatch) library(limma) # load data data(bladderdata) p<-pData(bladderEset) # create subset similar to what I have use<-sampleNames(bladderEset)[p$outcome=='Normal' | p$outcome=='sTCC-CIS'] subEset<-bladderEset[,use] p<-pData(subEset) xtabs(~batch+cancer,p)
# create model for combat and run it mod<-model.matrix(~1,data=p) batch<-p$batch combat<-subEset # duplicate subEset exprs(combat)<-ComBat(dat=exprs(subEset),batch=batch,mod=mod) groups<-p$outcome=='sTCC-CIS' #define groups for comparison par(mfrow=c(1,2)) # graphics # create a volcano plot for each of them for (eset in list(combat,subEset)){ p<-pData(eset) e<-exprs(eset) d<-rowMeans(e[,groups],na.rm=T)-rowMeans(e[,!groups],na.rm=T) # do row ttests tt<-rowttests(e,factor(groups)) lod<--log10(tt[['p.value']]) ## volcano plots smoothScatter(d,lod,xlim=c(-3,3),ylim=c(0,10)) abline(h=2,v=c(-1,1)) }
3) bladderbatch subset
cancer
batch Biopsy Cancer Normal
2 0 13 4
3 0 0 4
5 0 3 0