Question

DESeq2 design formula and batch correction

0

Entering edit mode

lransom • 0

@765ac6e2

Last seen 3.1 years ago

Hi everyone,

I'm trying to decide on what variables to include in my design formula for the DESeq2 dataset matrix, and am confused by how "controlling for" various variables is affecting my results.

For example, I am wondering if it makes sense to include "batch" in my design. Before doing any analysis, I made PCA plots from my data, and the samples do not seem to cluster by batch. If I run DESeq to compare across disease state without controlling for batch in my design and then generate a heatmap with hierarchical clustering, the samples from the same batch do not cluster together either. However, if I do add batch to my DESeq dataset design, run DESeq, and then create the same hierarchical heatmap, now the samples from the same batch cluster together.

Why would samples from the same batch start to cluster together after "controlling for" batch? I see similar changes in clustering behavior between the samples if I add other variables to my design such as subject age or sex. Does this mean it is not appropriate for me to control for these variables, or am I fundamentally not understanding what it means to "control" for variables by adding them to the design formula?

My design formula when comparing across disease state and controlling for batch is:

txi= readRDS("txi_1-28.rds") meta <- fread("variables.csv") meta <- as.data.frame(meta) row.names(meta)=meta$ID meta$Batch=factor(meta$Batch) ddsMat <- DESeqDataSetFromTximport(txi, colData = meta, design = ~ Batch + Disease)

I am new to RNA seq analysis and the only one in my lab working on it, so I have been googling away to try and answer my own questions but am still confused by this phenomenon.

Thank you so much!

DESeq2 • 838 views

ADD COMMENT • link updated 3.1 years ago by Michael Love 41k • written 3.1 years ago by lransom • 0

0

Entering edit mode

Could you show some plots and the colData to see how batch is connected to disease in your samples?

ADD REPLY • link 3.1 years ago ATpoint ★ 4.0k

score 0 · Answer 1 · 2021-03-07

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 9 hours ago

United States

However, if I do add batch to my DESeq dataset design, run DESeq, and then create the same hierarchical heatmap, now the samples from the same batch cluster together.

I need code to help diagnose what is happening. See the posting guide.

ADD COMMENT • link 3.1 years ago Michael Love 41k