Question: Call DE genes on Unbalanced design with controls
0
7 months ago by
timedreamer10
New York University
timedreamer10 wrote:

Hi there,

I googled the question, but could not find an answer that can solve my question, so I post it here.

Thank you so much in advance!!

I recently received a dataset that has already been sequenced. The idea was: for each batch of cell, transfected with one control vector and a bunch of gene overexpression vector. So, in batch one, I have one control vector and six overexpression vector, each with three replicates. In batch two, I have the same control vector but different six overexpression vector. The purpose is to see what genes are DE comparing overexpression with control samples. I know it's not a good design, but unfortunately, it has already been made. BTW, the batch effect is very obvious on PCA. I currently use edgeR for analysis. The previous analysis was done in DESeq2 by a colleague.

Question 1 is: if I use the model design <- model.matrix(~batch+plasmid). Since there is only control vector was repeated, does this model make sense? Or in another word, do I combine all batch together and use ~batch+plasmid OR separate each batch to call DE genes using ~plasmid? I'm not sure statistically which one is slightly better.

Question 2 is: if I repeat the experiment with vectors random picked six vectors, two batches. Will it help? If so, does the help come from simply more replicates?

Question 3 is: if I re-do the experiment, do you recommend put each replicate in separate batch, trying to fit a Balanced Incomplete Block (BIB) design or something like that? I can't do one replicate for all TFs in one batch (limited material).

A simple case would be like this:

plasmid <-factor(c(rep("control",3),rep("tf1",3),rep("control",3),rep("tf2",3)))
batch <- factor(c(rep("1",6),rep("2",6)))
design <- model.matrix(~batch+plasmid)
design

 (Intercept) batch2 plasmidtf1 plasmidtf2
1            1      0          0          0
2            1      0          0          0
3            1      0          0          0
4            1      0          1          0
5            1      0          1          0
6            1      0          1          0
7            1      1          0          0
8            1      1          0          0
9            1      1          0          0
10           1      1          0          1
11           1      1          0          1
12           1      1          0          1
attr(,"assign")
[1] 0 1 2 2
attr(,"contrasts")
attr(,"contrasts")$batch [1] "contr.treatment" attr(,"contrasts")$plasmid
[1] "contr.treatment"

edger experimental design • 131 views
modified 6 months ago • written 7 months ago by timedreamer10
Answer: Call DE genes on Unbalanced design with controls
1
6 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

Given that this is effectively an edgeR question, there's no point putting a DESeq2 tag here unless you want Mike to answer something specific.

Anyway, onto your questions. I'll refer to your simplified design as an example.

1. ~batch+plasmid is fine, assuming that the batch effect is additive. If it's not additive, it's still okay, provided you don't compare plasmidtf1 to plasmidtf2 (i.e., only compare within each batch). It would be unwise to subset your samples and perform the DE analysis separately; you need all the replication you can get.

2. Yes, the more replicates, the better. This gives you more accurate and precise dispersion estimates, which improves power. It also increases the robustness of the analysis to violations of distributional assumptions used in empirical Bayes shrinkage.

3. If you must have batches (e.g., from a logistical perspective), then the ideal design would contain the same number of control, tf1 and tf2 samples in each batch, for multiple batches. But if you can't do that, then a BIB approach would probably be the next best option. Minimize the number of blocks and maximize the overlaps in treatment conditions between blocks, as much as your material limits allow.