Question

limma: More differentially expressed genes when factor included in model than when used as blocking factor

0

Entering edit mode

Gerhard Thallinger ▴ 180

@gerhard-thallinger-1552

Last seen 5 weeks ago

Austria

I analyse bulk RNA-seq data from 21 samples in seven groups ('Group'), where each of the replicates originate from a different source ("Source"). Six contrasts are evaluated. From the MDS plot I can see a strong influence of the 'Source' variable.

I use the limma-voom workflow with three different models:

library(limma)
library(edgeR)
[...]
samples <- data.frame(Group=as.factor(rep(paste0("G", 1:7), rep=3)), Source=as.factor(rep(c(3,1,2), each=7)))
levels(samples$Source) <- c("2","3","1")

# Model 1
design <- model.matrix(~0+Group, data=samples) 
colnames(design) <- sub("Group", "", colnames(design))
dge.norm.voom <- voom(dge.norm, design, plot=TRUE)
dge.norm.voom.fit <- lmFit(dge.norm.voom)

# Model 2
design <- model.matrix(~0+Group+Source, data=samples)  
colnames(design) <- sub("Group", "", colnames(design))
dge.norm.voom <- voom(dge.norm, design, plot=TRUE)
dge.norm.voom.fit <- lmFit(dge.norm.voom)

# Model 3
design <- model.matrix(~0+Group, data=samples) 
colnames(design) <- sub("Group", "", colnames(design))
dge.norm.voom <- voom(dge.norm, design, plot=TRUE)
dge.norm.voom.cor <- duplicateCorrelation(dge.norm.voom, design, block=samples$Source)
dge.norm.voom.cor$consensus.correlation
# 0.290429
dge.norm.voom.fit <- lmFit(dge.norm.voom, design, block=samples$Source, 
                           correlation=dge.norm.voom.cor$consensus.correlation)

# Common to all models
contr.matrix <- makeContrasts(C1=G1-G7,C2=G2-G7,C3=G3-G7,C4=G4-G7,C5=G5-G7,C6=G6-G7, levels = colnames(design))
dge.norm.voom.fit <- contrasts.fit(dge.norm.voom.fit, contrasts=contr.matrix)
efit <- eBayes(dge.norm.voom.fit)
efit.dt <- decideTests(efit)
efit.dt.sum <- t(summary(efit.dt))
print(efit.dt.sum)

The three models yield the following numbers of differentially expressed genes:

             ~0+Group       |  ~0+Group+Source   |  ~0+Group (block=Source)
Contrast  Down NotSig   Up  |  Down NotSig   Up  |  Down NotSig  Up 
C1          53  18552   20  |   676  17500  449  |   571  17818 236 
C2         239  18322   64  |  1113  16828  684  |   962  17255 408 
C3         490  17936  199  |  1626  15846 1153  |  1535  16311 779 
C4          60  18539   26  |   450  17794  381  |   326  18093 206 
C5          59  18498   68  |   486  17521  618  |   410  17801 414 
C6          46  18557   22  |   369  17880  376  |   225  18241 159

Treating "Source" as a "not of interest" factor (Model #2) yields consistently more differentially expressed genes than using it as a blocking variable (Model #3).

Is this expected or may this indicate an error in the analysis?

The environment is as follows:

R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

locale:
[1] LC_COLLATE=English_Austria.utf8  LC_CTYPE=English_Austria.utf8    LC_MONETARY=English_Austria.utf8 LC_NUMERIC=C                     LC_TIME=English_Austria.utf8    

time zone: Europe/Vienna
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.42.4 limma_3.56.2

loaded via a namespace (and not attached):
[1] compiler_4.3.0 Rcpp_1.0.10    grid_4.3.0     locfit_1.5-9.8 statmod_1.5.0  lattice_0.21-8

voom blocking limma • 931 views

ADD COMMENT • link updated 13 months ago by Gordon Smyth 51k • written 13 months ago by Gerhard Thallinger ▴ 180

1

Entering edit mode

It's perfectly expected. Essentially it's a paired analysis removing the confounding effect of "Source".

ADD REPLY • link 13 months ago ATpoint ★ 4.5k

score 4 · Accepted Answer · 2023-10-09

Yes, it is completely to be expected. The purpose of random blocking via duplicateCorrelation etc is to recover inter-block information when the number of blocks is large and the treatments are unbalanced with respect to the blocks. Your experiment is completely balanced with only three blocks so there is no inter-block information to recover and no advantage for random effects. You should simply include Source in the design matrix as a fixed effect.

Having said that, the duplicateCorrelation analysis seems to be doing a pretty good job of approximating the fully blocked analysis and the difference between the two analyses doesn't seem dramatic.