I was wondering how could I compare an RNA-seq matrix of raw counts that I downloaded from a repository to my own RNA-seq experiment. Specifically, I want to see which samples from the downloaded paper are the closest to my samples. I have tried to combine all samples into one matrix and use DESeq2 to process the data and visualize PCA and the sample correlations without success. I can clearly see the differences between the download data and my own. The conditions of the total data set can be reproduced this way:
colData <- data.frame(condition = c(rep(c("A","B"), each = 2), rep(c("C","D","E"),each = 3)), batch = c(rep("1",7),rep("2",6)))
condition batch 1 A 1 2 A 1 3 B 1 4 B 1 5 C 1 6 C 1 7 C 1 8 D 2 9 D 2 10 D 2 11 E 2 12 E 2 13 E 2
Trying create the DESeq2 object with the design formula "~condition + batch" returns the "Model matrix not full rank" error. However, I am unsure about how to solve it since it seems to me this is a "perfect confounding" problem.
I have also tried to use the recommendation of using the limma package function removeBatchEffect after Variance stabilizing transformation with the design "~condition" only:
mat <- assay(vsd) mat <- limma::removeBatchEffect(mat, vsd$batch) assay(vsd) <- mat plotPCA(vsd)
But I can still see the batch effect. I was wondering if I should use the Combat algorithm to remove the batch effects, although it seems that I need to clean and normalize the data. Several people have also recommended me to not use it and try to model it instead with DESeq.
It would be much appreciated if someone could point me in the right direction! Thank you very much!