Question: DESeq2 multiple group comparison (ANOVA) with batch effect
0
gravatar for amin.ghareyazi
28 days ago by
amin.ghareyazi0 wrote:

Hi

I'm using DESeq2 to analyze raw read counts. I have 5 groups and 3 batches (different projects). In order to find out which genes are expressed differentially in all groups and also adjust for batch effects I conducted the LRT test step by step from the guideline. This is my code:

rna.o=DESeqDataSetFromMatrix(data,colD,design = ~projects+subtypes) rna.d=DESeq(rna.o,parallel = T,reduced = ~projects,test = "LRT") rna.res=results(rna.d,parallel = T)

as a result this code gave me thousands of highly significant genes. I took normalized counts and removed batch effects by comBat in SVA package to plot a heatmap of these significant genes. no significant differences was observable even in genes with adjusted p-values less than 10^-100. it is evident in this heatmap: https://drive.google.com/file/d/1EhIWQXYAhH1LNy5YJlLX6EG0RSA4Qpiq/view?usp=sharing

below link is a heatmap before the batch effect removal and the batch effect is completely evident: https://drive.google.com/open?id=1RSq3AaR9ucL_a03y4KAmgy17J0iFyePC

so i decided to use plotCounts to find out how much is the difference in groups for my top genes. this is the result: https://drive.google.com/open?id=18ue-oiTmdlFWrNbRXRQDTpUD4hKOkuGh this is the code:

plotCounts(rna.d, gene=rownames(rna.res[1,]), intgroup="Subtypes") . . . plotCounts(rna.d, gene=rownames(rna.res[5,]), intgroup="Subtypes")

It is completely evident that there are minor difference over groups in top 5 genes (p-value<10^-200). In the next step I plotted counts like this:

plotCounts(rna.d, gene=rownames(rna.res[1,]), intgroup=c("projects","Subtypes")) . . . plotCounts(rna.d, gene=rownames(rna.res[5,]), intgroup=c("projects","Subtypes"))

this is the result: https://drive.google.com/open?id=18ue-oiTmdlFWrNbRXRQDTpUD4hKOkuGh

there are more differences in combination of groups and projects but I can't see enough difference for a p-value like that. what is the problem and what should I do? should i just normalize by DESeq2, remove batch effect and move to limma?

ADD COMMENTlink modified 28 days ago by Michael Love25k • written 28 days ago by amin.ghareyazi0

Have you make an MDS plot? I am curious to see whether group 2 is located far apart from the rest.

ADD REPLYlink written 28 days ago by mikhael.manurung170

I tried first two PCs. Three projects were completely distinguished. But ComBat() did an excellent job in removing batch effects and batches couldn't be observed. see this plot. I'm just wondering if DESeq2 can cope with batch effect like ComBat?

ADD REPLYlink modified 27 days ago • written 27 days ago by amin.ghareyazi0
Answer: DESeq2 multiple group comparison (ANOVA) with batch effect
2
gravatar for Michael Love
28 days ago by
Michael Love25k
United States
Michael Love25k wrote:

For heatmaps, we have recommended code in the workflow. It’s hard to see anything on the count scale in a heatmap, we have recommended transformations to use.

In your first counts plot, I see differences across groups. In particular group 2 often appears different than the rest. The pvalue is for the null that all groups are equal. So I’m not sure what’s the issue exactly.

If you want to find genes where groups 1, 3-5 are different from each other you could remove group 2 and perform a similar LRT.

ADD COMMENTlink written 28 days ago by Michael Love25k

Thanks for your fast and precise reply Michael. Do you mean that differences are statistically significant but they are not observable in these plots? and the results are well adjusted for batch effect? I can't see differences in plotCounts for groups (third plot) but there are observable differences in plotCounts for projects:groups (fourth plot)

ADD REPLYlink modified 28 days ago • written 28 days ago by amin.ghareyazi0

In those plots, does group 2 look like groups 1, 3-5 to you? It has ~7 samples with counts an order of magnitude higher. Are we talking about the same plot?

ADD REPLYlink written 28 days ago by Michael Love25k

What is your opinion on fourth plot? Do you think group 2 with only 10 samples (297 samples in other groups) can cause these significant p-values and batch effects have no role in this?

ADD REPLYlink written 28 days ago by amin.ghareyazi0
1

Group 2 is mostly one batch, which also has samples from 3, 4 and 5.

Plot 4 looks nothing like null genes. If you have any specific additional questions, feel free to ask, but I don't see any issue with these genes rejecting the null.

Again, if you want to see differences across groups 1, 3-5, I've suggested how to find such genes.

ADD REPLYlink written 28 days ago by Michael Love25k

Final question: If I want to compare one group to multiple groups, I used this code: (for 2 VS 1,3,4,5) dds=DESeqDataSetFromMatrix(data,colD,design = ~projects+Subtypes) a=DESeq(dds) b=results(a,contrast = c(0,2,-1/4,-1/4,-1/4,-1/4,-1/4) ) Is it ok?

ADD REPLYlink modified 27 days ago • written 27 days ago by amin.ghareyazi0
1

I answered this in the new thread you started

https://support.bioconductor.org/p/123973/

Just to clarify, this is different than my recommendation above, where it sounds like you may want to exclude group 2, as you are not interested in finding these genes where group 2 is different than the rest. It would be simple to find these, then exclude group 2 to find additional genes with changes among the other groups with another LRT.

ADD REPLYlink written 27 days ago by Michael Love25k

thanks Michael. I understood your recommendation. I will do as exactly you said. just to mention, those heatmaps are exactly outputs of guideline codes and it is not in count scale. It is normalized. Those 10 samples have great impact on results.

ADD REPLYlink written 27 days ago by amin.ghareyazi0

I think the sum of the contrasts should be 1, no?

ADD REPLYlink written 27 days ago by mikhael.manurung170
1

In many cases the contrasts we recommend sum to 0. E.g. c(0,-1,1), or c(0,-1/2,-1/2,1/2,1/2), etc. There's not a rule that they should sum to 1, it depends on the design matrix.

ADD REPLYlink written 27 days ago by Michael Love25k

good point. thanks. but how can it find out which column of colD is batch and which is groups?

ADD REPLYlink written 27 days ago by amin.ghareyazi0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 194 users visited in the last hour