Dear Bioconductor List,
I am analyzing an RNAseq experiment taken from two different batches. I have encountered
a strange voom plot and cannot get rid of it by taking batch effects into account as a co-variate. I will present the relevant information, and then ask some questions at the end.
All images can be found at:
Only 1 conditon in the first batch is included in the comparison. The target file (abridged) is:
FileName Name Target Batch
JS006 JS006 b.dor.controlchow.pac 1
...
JS220 JS220 h.dor.acychow.pac 2
There was ome difference in the genes expressed between batch
1 and 2, so I merged the raw count matrices in R and changed
an NAs to zeros.
When I ran Limma-Voom with weights and without batch as a factor I got this strange
non-monotonic voom curve which is entitled:
barplot.acy_rnaseq.voomplot.bothbatches_nobatchfactor.png
When I ran just batch 2 I got an ordinary voom curve
which is entitled barplot.acy_rnaseq.voomplot.newbatchonly.png.
I therefore concluded that the glitch in the first graph is
due to batch effects.
I then tried batch adjustment in Limma, using batch as a
covariate.
MDSplot_bybatch_log_no_correction.pdf
shows that there is a pronounced batch effect.
I appleid a batch correction with the removeBatchEffect
command.
MDSplot_bybatch_log_correction.pdf shows that there is a pronounced batch effect.
I then tried Limma-Voom using Batch as a covariate and I got the same kind
of strange Voom plot (barplot.acy_rnaseq.voomplot.batchcorrect.png). Plus an error message:
> v<-voomWithQualityWeights(y,design=des,normalization="none",plot=TRUE)
Coefficients not estimable: h.dor.acychow.pac
Coefficients not estimable: h.dor.acychow.pac
Warning messages:
1: Partial NA coefficients for 15380 probe(s)
2: Partial NA coefficients for 15380 probe(s)
Here is a record of my session (abridged for charatcer limit)
then my questions:
R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)
> library(limma)
> library(edgeR)
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X Mavericks 10.9.5
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.16.5 limma_3.30.7
loaded via a namespace (and not attached):
[1] grid_3.3.2 locfit_1.5-9.1 lattice_0.20-34
> targets<-readTargets("targetsbatch.txt")
> Counts<-read.table("counts.txt",header=T,sep="\t")
> png("barplot.acy.png")
> barplot(colSums(Counts)*1e-6,names=1:25,ylab="library size (millions)")
> dev.off()
null device
1
> y<-DGEList(counts=Counts, genes=rownames(Counts))
> isexpr<-rowSums(cpm(y)>1) >= 5
> hasannot<-rowSumsis.na(y$genes))==0
> y<-y[isexpr & hasannot,]
> y<-calcNormFactors(y)
> # Now predict result with batch effects removed
> logCPM<-cpm(y,log=TRUE,prior.count=5)
> plotMDS(logCPM,labels=targets$Batch,col=c(rep("red",5),rep("blue",20)))
> logCPMadj<-removeBatchEffect(logCPM,batch=targets$Batch)
> plotMDS(logCPMadj,labels=targets$Batch,col=c(rep("red",5),rep("blue",20))
> f<-factor(targets$Target,levels=c("b.dor.controlchow.pac","e.dor.controlchow.noinjec","f.dor.acychow.noinjec","g.dor.acychow.veh","h.dor.acychow.pac"))
> batch<-factor(targets$Batch)
> des<-model.matrix(~batch+f)
> colnames(des)<- c("(Intercept)","batch2","e.dor.controlchow.noinjec","f.dor.acychow.noinjec","g.dor.acychow.veh","h.dor.acychow.pac")
> des
(Intercept) batch2 e.dor.controlchow.noinjec f.dor.acychow.noinjec g.dor.acychow.veh h.dor.acychow.pac
1 1 0 0 0 0 0
...
25 1 1 0 0 0 1
> png("barplot.acy_rnaseq.voomplotnew.png")
> v<-voomWithQualityWeights(y,design=des,normalization="none",plot=TRUE)
Coefficients not estimable: h.dor.acychow.pac
Coefficients not estimable: h.dor.acychow.pac
Warning messages:
1: Partial NA coefficients for 15380 probe(s)
2: Partial NA coefficients for 15380 probe(s)
> dev.off()
null device
1
>
Questions:
1. Did I make a mistake in the script (especially the model) that led to the error message,if so, what, and how may I correct it
2. If not, is it possible to correct these samples using batch as a
co-variate or is doing so prevented by teh unbalanced design?
3. Should I simply filter out enough genes withg low counts so the curve fits?
4. Do you have any other suggestions (I understand from previous
posts that use of Combat for normalization is counterindicated with Voom).
Thank you as always,
Rich
Richard A. Friedman, PhD
Were your two batches of data quantitated using the same pipeline?
Steve,
Thank you for our immediate reply. As far as I know, "Yes",
because it was done with the same protocol at the same core.
However, I will query the experimentalist directly.
Best wishes,
Rich
The suspicious thing for me is that you needed to do some manual munging/merging of the two datasets and account for NAs after that, which tells me that there were some genes that were quantitated on one side that weren't in the other (or vice versa).
All things being equal, if you're working with raw data coming out of the same pipeline, I would expect you to have a count matrix with the same rows (with the same identify), even if there were 0 counts for some genes, they should probably still be 0.
So, I'd double check that.
While I can't be sure, I imagine the cloud on the left might be those same genes that you had to fiddle 0's to NAs -- maybe.
In any event. Once you double check that, note that you must always filter out lowly expressed genes anyway when using voom.
Steve,
Most of the genes were the same. I forget the number of NAs but
I can check. fewer than 1,000 I think. There are 5 replicates of one condition in the old batch
and 5 replicates each for 4 conditions in the new batch. The filter I used for low expressing genes is:
isexpr<-rowSums(cpm(y)>1) >= 5
What filter would you suggest?
But your points are well taken.
If no differnences show in the protocol, I will try limiting genes studied to those which
appear in both batches, and see if I still have a batch effect.
Does that sound reasonable?
Also, if you (or other readers) have any insight into how I got the error message in
with the voom command, apparently due to an error in the linear model,
I would greatly appreciate it.
Thanks and bets wishes,
Rich