Question

DESeq2 multiple group comparison (ANOVA) with batch effect

0

Entering edit mode

amin.ghareyazi • 0

@aminghareyazi-21539

Last seen 4.8 years ago

Hi

I'm using DESeq2 to analyze raw read counts. I have 5 groups and 3 batches (different projects). In order to find out which genes are expressed differentially in all groups and also adjust for batch effects I conducted the LRT test step by step from the guideline. This is my code:

rna.o=DESeqDataSetFromMatrix(data,colD,design = ~projects+subtypes) rna.d=DESeq(rna.o,parallel = T,reduced = ~projects,test = "LRT") rna.res=results(rna.d,parallel = T)

as a result this code gave me thousands of highly significant genes. I took normalized counts and removed batch effects by comBat in SVA package to plot a heatmap of these significant genes. no significant differences was observable even in genes with adjusted p-values less than 10^-100. it is evident in this heatmap: https://drive.google.com/file/d/1EhIWQXYAhH1LNy5YJlLX6EG0RSA4Qpiq/view?usp=sharing

below link is a heatmap before the batch effect removal and the batch effect is completely evident: https://drive.google.com/open?id=1RSq3AaR9ucL_a03y4KAmgy17J0iFyePC

so i decided to use plotCounts to find out how much is the difference in groups for my top genes. this is the result: https://drive.google.com/open?id=18ue-oiTmdlFWrNbRXRQDTpUD4hKOkuGh this is the code:

plotCounts(rna.d, gene=rownames(rna.res[1,]), intgroup="Subtypes") . . . plotCounts(rna.d, gene=rownames(rna.res[5,]), intgroup="Subtypes")

It is completely evident that there are minor difference over groups in top 5 genes (p-value<10^-200). In the next step I plotted counts like this:

plotCounts(rna.d, gene=rownames(rna.res[1,]), intgroup=c("projects","Subtypes")) . . . plotCounts(rna.d, gene=rownames(rna.res[5,]), intgroup=c("projects","Subtypes"))

this is the result: https://drive.google.com/open?id=18ue-oiTmdlFWrNbRXRQDTpUD4hKOkuGh

there are more differences in combination of groups and projects but I can't see enough difference for a p-value like that. what is the problem and what should I do? should i just normalize by DESeq2, remove batch effect and move to limma?

DESeq2 DEG ANOVA Batch effect Comparison • 3.3k views

ADD COMMENT • link updated 5.5 years ago by Michael Love 43k • written 5.5 years ago by amin.ghareyazi • 0

0

Entering edit mode

Have you make an MDS plot? I am curious to see whether group 2 is located far apart from the rest.

ADD REPLY • link 5.5 years ago mikhael.manurung ▴ 280

0

Entering edit mode

I tried first two PCs. Three projects were completely distinguished. But ComBat() did an excellent job in removing batch effects and batches couldn't be observed. see this plot. I'm just wondering if DESeq2 can cope with batch effect like ComBat?

ADD REPLY • link 5.5 years ago amin.ghareyazi • 0

score 2 · Accepted Answer · 2019-08-20

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

For heatmaps, we have recommended code in the workflow. It’s hard to see anything on the count scale in a heatmap, we have recommended transformations to use.

In your first counts plot, I see differences across groups. In particular group 2 often appears different than the rest. The pvalue is for the null that all groups are equal. So I’m not sure what’s the issue exactly.

If you want to find genes where groups 1, 3-5 are different from each other you could remove group 2 and perform a similar LRT.

ADD COMMENT • link 5.5 years ago Michael Love 43k

0

Entering edit mode

Thanks for your fast and precise reply Michael. Do you mean that differences are statistically significant but they are not observable in these plots? and the results are well adjusted for batch effect? I can't see differences in plotCounts for groups (third plot) but there are observable differences in plotCounts for projects:groups (fourth plot)

ADD REPLY • link 5.5 years ago amin.ghareyazi • 0

0

Entering edit mode

In those plots, does group 2 look like groups 1, 3-5 to you? It has ~7 samples with counts an order of magnitude higher. Are we talking about the same plot?

ADD REPLY • link 5.5 years ago Michael Love 43k

0

Entering edit mode

What is your opinion on fourth plot? Do you think group 2 with only 10 samples (297 samples in other groups) can cause these significant p-values and batch effects have no role in this?

ADD REPLY • link 5.5 years ago amin.ghareyazi • 0

1

Entering edit mode

Group 2 is mostly one batch, which also has samples from 3, 4 and 5.

Plot 4 looks nothing like null genes. If you have any specific additional questions, feel free to ask, but I don't see any issue with these genes rejecting the null.

Again, if you want to see differences across groups 1, 3-5, I've suggested how to find such genes.

ADD REPLY • link 5.5 years ago Michael Love 43k

0

Entering edit mode

Final question: If I want to compare one group to multiple groups, I used this code: (for 2 VS 1,3,4,5) dds=DESeqDataSetFromMatrix(data,colD,design = ~projects+Subtypes) a=DESeq(dds) b=results(a,contrast = c(0,2,-1/4,-1/4,-1/4,-1/4,-1/4) ) Is it ok?

ADD REPLY • link 5.5 years ago amin.ghareyazi • 0

1

Entering edit mode

I answered this in the new thread you started

https://support.bioconductor.org/p/123973/

Just to clarify, this is different than my recommendation above, where it sounds like you may want to exclude group 2, as you are not interested in finding these genes where group 2 is different than the rest. It would be simple to find these, then exclude group 2 to find additional genes with changes among the other groups with another LRT.

ADD REPLY • link 5.5 years ago Michael Love 43k

0

Entering edit mode

thanks Michael. I understood your recommendation. I will do as exactly you said. just to mention, those heatmaps are exactly outputs of guideline codes and it is not in count scale. It is normalized. Those 10 samples have great impact on results.

ADD REPLY • link 5.5 years ago amin.ghareyazi • 0

0

Entering edit mode

I think the sum of the contrasts should be 1, no?

ADD REPLY • link 5.5 years ago mikhael.manurung ▴ 280

1

Entering edit mode

In many cases the contrasts we recommend sum to 0. E.g. c(0,-1,1), or c(0,-1/2,-1/2,1/2,1/2), etc. There's not a rule that they should sum to 1, it depends on the design matrix.

ADD REPLY • link 5.5 years ago Michael Love 43k

0

Entering edit mode

good point. thanks. but how can it find out which column of colD is batch and which is groups?

ADD REPLY • link 5.5 years ago amin.ghareyazi • 0