Hi
I'm using DESeq2 to analyze raw read counts. I have 5 groups and 3 batches (different projects). In order to find out which genes are expressed differentially in all groups and also adjust for batch effects I conducted the LRT test step by step from the guideline. This is my code:
rna.o=DESeqDataSetFromMatrix(data,colD,design = ~projects+subtypes) rna.d=DESeq(rna.o,parallel = T,reduced = ~projects,test = "LRT") rna.res=results(rna.d,parallel = T)
as a result this code gave me thousands of highly significant genes. I took normalized counts and removed batch effects by comBat in SVA package to plot a heatmap of these significant genes. no significant differences was observable even in genes with adjusted p-values less than 10^-100. it is evident in this heatmap: https://drive.google.com/file/d/1EhIWQXYAhH1LNy5YJlLX6EG0RSA4Qpiq/view?usp=sharing
below link is a heatmap before the batch effect removal and the batch effect is completely evident: https://drive.google.com/open?id=1RSq3AaR9ucL_a03y4KAmgy17J0iFyePC
so i decided to use plotCounts to find out how much is the difference in groups for my top genes. this is the result: https://drive.google.com/open?id=18ue-oiTmdlFWrNbRXRQDTpUD4hKOkuGh this is the code:
plotCounts(rna.d, gene=rownames(rna.res[1,]), intgroup="Subtypes") . . . plotCounts(rna.d, gene=rownames(rna.res[5,]), intgroup="Subtypes")
It is completely evident that there are minor difference over groups in top 5 genes (p-value<10^-200). In the next step I plotted counts like this:
plotCounts(rna.d, gene=rownames(rna.res[1,]), intgroup=c("projects","Subtypes")) . . . plotCounts(rna.d, gene=rownames(rna.res[5,]), intgroup=c("projects","Subtypes"))
this is the result: https://drive.google.com/open?id=18ue-oiTmdlFWrNbRXRQDTpUD4hKOkuGh
there are more differences in combination of groups and projects but I can't see enough difference for a p-value like that. what is the problem and what should I do? should i just normalize by DESeq2, remove batch effect and move to limma?
Have you make an MDS plot? I am curious to see whether group 2 is located far apart from the rest.
I tried first two PCs. Three projects were completely distinguished. But ComBat() did an excellent job in removing batch effects and batches couldn't be observed. see this plot. I'm just wondering if DESeq2 can cope with batch effect like ComBat?