Hi,
I have 26 samples from 12 patients sequenced in two batches. 6 of these are "normal" samples while the other 20 are tumors.
What I have done before is simply compare the tumors against the normals, and include the batch variable as follows:
design.file$batch <- as.factor(design.file$batch)
dds <- DESeqDataSetFromMatrix(countData = read_counts, colData = design.file, design = ~ batch + TvNL)
dds <- DESeq(dds, parallel=TRUE)
res <- results(dds, contrast=c("TvNL", "dx1", "dx0"))
However, now I would like to perform this analysis but take in to account the patient variable. I have tried this but the results do not make sense:
design.file$patient <- as.factor(design.file$patient)
dds <- DESeqDataSetFromMatrix(countData = read_counts, colData = design.file, design = ~ batch + patient + TvNL)
I have read the vignettes but am confused about whether I need to use an interaction term. Is this the formula I am looking for?:
design = ~ batch + patient + TvNL + patient:TvNL
Thanks and I am happy to clarify and or provide any additional information.
Sujay
There are between 1 and 4 Tumor samples per patient, and sometimes (but not always) a Normal sample.
For this kind of mixed design, where some patients have additional tumor samples, and others not, and some paired with normal and others not, I'd recommend limma-voom with duplicateCorrelation(), because this is a more flexible approach than adding fixed effects to a GLM. It's not always possible to control for various correlations with fixed effects, while duplicateCorrelation() allows for specification of correlations among samples.
Take a look at the limma User Guide and perhaps post a new question tagging limma if you have further questions.