I have samples from 1 experiment (experiment A) distributed in 2 sequencing runs.
This experiment consists of 3 biological replicates, and the first replicate has 3 technical replicates, which I collapse using the collapseReplicates function.
Moreover, the 2nd sequencing run contains also samples from a different experiment (experiment B).
Now I want to analyse the experiment A using DESeq2, correcting for the batch effect with design=~batch+condition
What is the best timing to remove the samples from experiment B? Before or after running DESeq2?
I’ve tried different methods, and they result in different adjusted p-values of genes.
Here is an example of what I’ve done (It’s simplified. In reality there are 3 treatment types.)
Does it look reasonable? Can I collapse experiment B in this way, since it’s not relevant for this analysis and the samples cluster tightly together in a PCA plot?
Thanks for your help!
sample_id <- factor(c("control_1.1","control_1.2","control_1.3", "control_2","control_3",
"treatment_1.1", "treatment_1.2", "treatment_1.3", "treatment_2", "treatment_3",
"expB_1", "expB_2","expB_3","expB_4","expB_5","expB_6"))
condition <- factor(c(rep("control", 5), rep("treatment",5), rep("expB",6)))
batch <- factor(c(rep("1", 8), rep("2", 8)))
collapse <- factor(c(rep("control_1",3), "control_2","control_3",
rep("treatment_1",3), "treatment_2", "treatment_3", rep("NA", 6)))
coldata <- data.frame(sample_id, condition, batch, collapse, row.names=1)
dds <- DESeqDataSetFromMatrix(countdata, coldata, design=~batch+condition)
ddsCol <- collapseReplicates(dds, groupby = dds$collapse)
dds <- ddsCol
dds$condition <-relevel(dds$condition, ref="control")
dds <- DESeq(dds)