Question

DESeq2 formula to include batch AND patient, and condition ?

0

Entering edit mode

sxv • 0

@sxv-14831

Last seen 5.3 years ago

Hi,

I have 26 samples from 12 patients sequenced in two batches. 6 of these are "normal" samples while the other 20 are tumors.

What I have done before is simply compare the tumors against the normals, and include the batch variable as follows:

design.file$batch <- as.factor(design.file$batch)
dds <- DESeqDataSetFromMatrix(countData = read_counts, colData = design.file, design = ~ batch + TvNL)
dds <- DESeq(dds, parallel=TRUE)
res <- results(dds, contrast=c("TvNL", "dx1", "dx0"))

However, now I would like to perform this analysis but take in to account the patient variable. I have tried this but the results do not make sense:

design.file$patient <- as.factor(design.file$patient)
dds <- DESeqDataSetFromMatrix(countData = read_counts, colData = design.file, design = ~ batch + patient + TvNL)

I have read the vignettes but am confused about whether I need to use an interaction term. Is this the formula I am looking for?:

design = ~ batch + patient + TvNL + patient:TvNL

Thanks and I am happy to clarify and or provide any additional information.

Sujay

deseq2 • 1.1k views

ADD COMMENT • link updated 5.4 years ago by Michael Love 43k • written 5.4 years ago by sxv • 0

score 0 · Answer 1 · 2019-07-14

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 11 hours ago

United States

What do you mean take into account patient? Is there any patient replication, or one sample per patient? If the latter you don’t include it in the design, but use your original design.

ADD COMMENT • link 5.4 years ago Michael Love 43k

0

Entering edit mode

There are between 1 and 4 Tumor samples per patient, and sometimes (but not always) a Normal sample.

ADD REPLY • link 5.4 years ago sxv • 0

1

Entering edit mode

For this kind of mixed design, where some patients have additional tumor samples, and others not, and some paired with normal and others not, I'd recommend limma-voom with duplicateCorrelation(), because this is a more flexible approach than adding fixed effects to a GLM. It's not always possible to control for various correlations with fixed effects, while duplicateCorrelation() allows for specification of correlations among samples.

Take a look at the limma User Guide and perhaps post a new question tagging limma if you have further questions.

ADD REPLY • link 5.4 years ago Michael Love 43k