Question: RNA-seq unbalanced batch effect correction
gravatar for sckinta
4 months ago by
sckinta0 wrote:


i have a set of RNAseq data with unbalanced batch effect (see table below). Batch 1 was made of single indexed kit and sequenced at time 1, while batch 2 was made of dual indexed kit and sequenced independently from batch 1.

  sample batch groups
1 Naive_Dmt3aKO_rep1 2 Naive_Dmt3aKO
2 Naive_Dmt3aKO_rep2 2 Naive_Dmt3aKO
3 Naive_Dmt3aKO_rep3 2 Naive_Dmt3aKO
4 Naive_WT_rep1 2 Naive_WT
5 Naive_WT_rep2 2 Naive_WT
6 Naive_WT_rep3 2 Naive_WT
7 Th17_Dmt3aKO_rep1 2 Th17_Dmt3aKO
8 Th17_Dmt3aKO_rep2 2 Th17_Dmt3aKO
9 Th17_Dmt3aKO_rep3 1 Th17_Dmt3aKO
10 Th17_WT_rep1 2 Th17_WT
11 Th17_WT_rep2 2 Th17_WT
12 Th17_WT_rep3 1 Th17_WT
13 Th1_Dmt3aKO_rep1 2 Th1_Dmt3aKO
14 Th1_Dmt3aKO_rep2 2 Th1_Dmt3aKO
15 Th1_Dmt3aKO_rep3 1 Th1_Dmt3aKO
16 Th1_WT_rep1 2 Th1_WT
17 Th1_WT_rep2 2 Th1_WT
18 Th1_WT_rep3 1 Th1_WT
19 Th2_Dmt3aKO_rep1 2 Th2_Dmt3aKO
20 Th2_Dmt3aKO_rep2 2 Th2_Dmt3aKO
21 Th2_Dmt3aKO_rep3 1 Th2_Dmt3aKO
22 Th2_WT_rep1 2 Th2_WT
23 Th2_WT_rep2 2 Th2_WT
24 Th2_WT_rep3 1 Th2_WT

From my exploratory analysis, I noticed batch 1 samples and batch 2 samples are clustered independently from each other.

Thus, on my DE analysis design, I used batch as covariant to evaluate the batch effect DE. 

groups <- relevel(groups, ref="Naive_KO")
batch <- relevel(batch,ref="1")
design <- model.matrix(~batch+groups, data=y$samples)
y_filtered <- estimateDisp(y_filtered,design)
fit <- glmQLFit(y_filtered, design, robust=T)

I found 24439 genes were differentially expressed btw batch 1 and batch 2.

#### batch effect 
batch_DE <- glmQLFTest(fit, coef=2)
FDR <- p.adjust(batch_DE$table$PValue, 'fdr')
sum(FDR < 0.05)
# 24439

My questions:

1. since Naive has only batch 2 samples no batch 1 samples, Can 24439 batch DE genes be caused by difference between other Th and Naive? In the linear regression, ~batch+groups , we assume batch and group are independent. theoretically, those 24439 genes should be independent of group difference, but this unbalanced design really bothers me.

2. Will this unbalance design affect differential analysis between groups? for example, comparing Th1_WT to Naive_WT. 



edger batch effect rnaseq • 170 views
ADD COMMENTlink modified 4 months ago by James W. MacDonald49k • written 4 months ago by sckinta0
Answer: RNA-seq unbalanced batch effect correction
gravatar for James W. MacDonald
4 months ago by
United States
James W. MacDonald49k wrote:

That's not what you would normally call an unbalanced design. Well, it is unbalanced (meaning you have different numbers of samples in some of the groups), but what you are talking about is confounding. The naive samples are completely confounded with batch, so you cannot say if any differences between those samples and any of the other samples is due to a batch effect or real biological differences.

That, unfortunately, is an unfixable problem.

ADD COMMENTlink written 4 months ago by James W. MacDonald49k

To clarify James' comments, note that your design is not fully confounded; it's still possible to perform contrasts between all pairs of groups. However, the DE genes between batches cannot be caused by any of the naive samples. Any inter-batch differences are estimated solely from the Th* groups that have samples in both batches.

For your second question, the differences in the number of samples between groups or batches will not compromise the validity of the DE analysis with respect to controlling the error rate. Obviously, if you had equal numbers of samples in each group (or if you had no batch effects in the first place), you would get more power, but that can't be helped now.

ADD REPLYlink modified 4 months ago • written 4 months ago by Aaron Lun22k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 181 users visited in the last hour