Question

differential expression analysis in edgeR when you have 2 experiments and batch effect in 1 of the 2 experiments.

0

Entering edit mode

sezimmer • 0

@sezimmer-8642

Last seen 7.5 years ago

United States

Dear Bioconductor Community,

I am trying to do a differential expression analysis between different cell types, where I have four different cell types from two different experiments. Two of the cell types come from one experiment and four cell types come from the other. I have a total of 18 samples. I have also detected that there is a batch effect in the samples coming from the second experiment. How would I do a differential expression analysis when I have a batch effect in only 12 of my 16 samples? Would I make a design matrix that includes the batch effect as well as the experiment number? Below is an example of what I am describing.

cellType <- factor(c("cellType1","cellType2","cellType1","cellType2","cellType1","cellType2",rep("cellType3",3),rep("cellType4",3),rep("cellType5",3),rep("cellType6",3)))
batches2 <- factor(c(0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0))
experiment <- factor(c(0,0,0,0,0,0,rep("1",12)))
myDesign <- model.matrix(~0+cellType+experiment+batches2)
colnames(myDesign) <- c("cellType1","cellType2","cellType3","cellType4","cellType5","cellType6","experiment","batch")

Also, if I wanted to get the values of the batch corrected data after doing differential expression analysis would I put the experiment as a covariate or would I include it in the design? Below is an example of both methods.

log2RPKMmat <- matrix(sample(1:16,90,replace=T), ncol=18)
batchVals_Covariate <- removeBatchEffect(log2RPKMmat,batch=batches2,design=model.matrix(~0+cellType),covariates=experiment)
batchVals_noCovariate <- removeBatchEffect(log2RPKMmat,batch=batches2,design=model.matrix(~0+cellType+experiment))

I also looked at the paper "Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses." It says that a two-way ANOVA model (which I believe I may be describing above) could cause false positives because my study is unbalanced. Are there any other methods you would propose?

Thank you!

Sam

edger batch effect differential expression • 994 views

ADD COMMENT • link 7.5 years ago sezimmer • 0

score 1 · Answer 1 · 2016-10-13

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 18 hours ago

The city by the bay

Your design matrix looks fine, though it seems you've mixed up your column names for the batch and experiment factors (not that it really matters, because you're only interested in the cell types). You can then use makeContrasts to construct contrasts to compare any two cell types. As to the paper you're referring to, I've only glanced at it but I believe it describes the case where you remove the batch effects prior to the analysis. If you include the batches as blocking factors in the design matrix, as you have done, there shouldn't be any loss of error control because the uncertainty of estimation of the batch effects will be considered in the significance calculations. Finally, to remove the experiment effect, use the batch2 argument in removeBatchEffect to specify a second blocking factor. Don't put it in the design, as it won't be removed if you do that.

ADD COMMENT • link 7.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks Aaron for the help. I updated the question above to fix the mix up in labeling the design matrix as well. Unfortunately though, that design matrix did not work. When I used it in the estimateDisp function of edgeR I got an error saying the Design matrix was not of full rank. Do you know why this happened? I looked up the error and found this post: EdgeR Design matrix not of full rank but it does not seem like their are any dependancies in my design matrix.

Again, thanks for your help.

Sam

ADD REPLY • link 7.5 years ago sezimmer • 0

0

Entering edit mode

That shouldn't be the case - are you sure you're running estimateDisp with the same myDesign that you construct in your post? When I run your code, I get a design matrix of full rank (as checked via qr).

ADD REPLY • link 7.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

You are right. I realized I made a mistake in the experiment variable. Instead of

experiment <- factor(c(0,0,rep("1",16)))

<font face="monospace">it should be. </font>

experiment <- factor(c(0,0,0,0,0,0,rep("1",12)))

<font face="monospace">This is because my first experiment is for the first 6 samples, and the second experiment is for the last 12 samples. I edited my question above to reflect the change.</font>

ADD REPLY • link 7.5 years ago sezimmer • 0

0

Entering edit mode

Okay, well, you have a problem, because cell types 3-6 are only present in one experiment and cell types 1-2 are only present in the other. This means that there's no way to distinguish between the experiment effect and that of the difference between cell types. To be able to fit a GLM, you need to drop the experiment factor from your design matrix to estimate the average expression levels for each cell type. However, keep in mind that you cannot compare between cell types from different experiments. edgeR won't stop you from doing so, but you really shouldn't.

ADD REPLY • link 7.5 years ago Aaron Lun ★ 28k