Hi, I use DESeq2 in R for differential expression calculation. I already know the design is terrible, but this is what I have to work with ;) Question 1: can I remove the batch effect between 2 groups if their samples do not share a batch, but some of their samples share a batch with a third group? (groups are different genotypes, batches are different experiments) Example: I want to compare group gA (samples s1, s2, s3) with group gC (samples s8, s9, s10, s11). These samples come from different batches (batch bX and bY), but no sample of group gA shares a batch with samples from group gC, so a batch effect removal would be impossible. However, there is group gB (samples s4, s5, s6, s7), which has 2 samples from batch bX and 2 from batch bY.
myColData <- data.frame(row.names=c("s1","s2","s3","s4","s5","s6","s7","s8","s9","s10","s11"), group=c("gA","gA","gA","gB","gB","gB","gB","gC","gC","gC", "gC"), batch=c("bX","bX","bX","bX","bX","bY","bY","bY","bY","bY", "bY")) print(myColData) myCounts <- matrix(round(runif(1100, 0, 20)), ncol=11, nrow=100)
My approach would be to 1) initiate a DESeq object, with known batch effect in the design 2) apply the DESeq function over it 3) calculate the results of that with only gC and gA given in the contrast
library(DESeq2) myDE1 <- DESeqDataSetFromMatrix(myCounts, colData=myColData, design=~batch+group) myDE2 <- DESeq(myDE1) myDE3 <- results(myDE2, contrast=c("group", "gC", "gA"))
My 1st question: Is this a correct way to handle the batch effect? My 2nd question: Let's assume there is no batch here, just the groups. DESeq2 will give different results when I give as input (step 1) the whole matrix with all samples and then compare (in step 3) only the groups I care about (as done above, Input was gA, gB and gC but I am only interested in gC vs. gA), compared with when I give a matrix with only the groups of Interest to begin with. Is one of the ways incorrect? Shouldn't the results (myDE03 & myDE13) theoretically be the same?
#input whole matrix, compare only gC and gA in the end myDE01 <- DESeqDataSetFromMatrix(myCounts, colData=myColData, design=~group) myDE02 <- DESeq(myDE01) myDE03 <- results(myDE02, contrast=c("group", "gC", "gA")) myDE03 #input only samples from gC and gA and compare them myDE11 <- DESeqDataSetFromMatrix(myCounts[,c(1:3,8:11)], colData=myColData[c(1:3,8:11),], design=~group) myDE12 <- DESeq(myDE11) myDE13 <- results(myDE12, contrast=c("group", "gC", "gA")) myDE13
Any help is appreciated