Question: Batch correction with DeSeq2: problem with creating model and getting all conditions
0
8 weeks ago by

I have RNAseq data for three conditions (A, B, and C) with results from three different batches (PK, GB, and EG) and I am trying to apply batch correction using DeSeq2. I am intersted in contrasting all three condtions (i.e. A vs B, B vs C and A vs C) while taking the batch effect into account. My code is as follow:

# Assign condition and batch
(condition <- factor(c(rep("A", 3), rep("B", 3), rep("C", 3))))
(batch <- factor (c("PK","GB", "EG","PK","GB", "EG","PK","GB", "EG")))

# Analysis with DESeq2 ----------------------------------------------------

library(DESeq2)

# Create a coldata frame and instantiate the DESeqDataSet
(coldata <- data.frame(row.names=colnames(countdata), condition, batch))
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~batch + condition)

# Run the DESeq pipeline
dds <- DESeq(dds)


When I run this code and then do resutsNames(dds), I get the following:

[1] "Intercept"        "batch_GB_vs_EG"   "batch_PK_vs_EG"   "condition_B_vs_A" "condition_C_vs_A"


However, I am also interested in "conditionAvsC" and I am not sure why this is not being created by this design formula. Also, I am not sure why "batchPKvsGB" is not being created. Is there something wrong with my design formula?

deseq2 • 99 views
modified 8 weeks ago by Michael Love23k • written 8 weeks ago by pkhadka0
Answer: Batch correction with DeSeq2: problem with creating model and getting all condit
0
8 weeks ago by
Michael Love23k
United States
Michael Love23k wrote:

You can extract contrasts of interest using the contrast argument of results. Take a look at the help page:

?results


The answer as to why certain coefficients are listed involves how linear models are represented in R. You can take a look at course material about linear models, for example the course I taught with Rafael Irizarry on Edx has some online material about linear models.

Thank you! Just another quick follow-up question: I want to get normalized counts corrected for the batch effect for all three conditions. I know that I can get normalized counts using "counts (dds, normalized=TRUE)", but I am not sure if the counts here is corrected for the batch effect?

I’d recommend vst followed by the limma::removeBatchEffect function.

I have actually used that function to do this before. I am just wondering if the normalized counts generated by DeSeq2 are batch corrected? And more importantly, are the DE genes that I get using the results function batch corrected?

DE results control for batch (this is the same design as the vignette where we talk about controlling for covariates).

No, counts normalized or not, as well as transformed counts (vst, rlog) are not batch corrected in DESeq2.