Hi
I would like to correct batch effect using deseq2, to analyze one hundred of RNA-Seq of tumors, without experimental design. 80 tumors were sequenced in 2018, and 20 in 2019; I can see a strong batch effect whent I plot PCA on reads count or on data after DESeq between 2018 and 2019 tumors.
Usualy I used:
coldata <- as.data.frame(rep(TRUE, each=100))
rownames(coldata)<- colnames(COUNT)
colnames(coldata)<- c("group")
dds<-DESeqDataSetFromMatrix(COUNT, coldata,design=~1)
I read that DESeq can correct batch effect with this kind of command:
dds <- DESeqDataSet(COUNT, design = ~ batch + condition)
But in my case I have no condition, and I tried "design=~batch", but without effect. I can remove efficiently batch effect with ComBat, with very good result on PCA plot. But then I have negatives values in my matrix, which is a problem for further analysis.
Which solution can I try?
Thank you for any suggestion.
Thank you for your response. But by adding batch to the design, I still have a strong batch effect when I plot PCA with DESeq matrix. So I can't use these matrix to clustering tumors.
When I remove batch effect with ComBat, I have very good results very good result on PCA plot, but I can't use ComBat matrix because negative values.
So what solution do you recommend?
There’s a FAQ exactly covering your question here in the DESeq2 vignette.
https://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-after-vst-are-there-still-batches-in-the-pca-plot