Question: Remove batch effect using individuals as random in the model in Limma
0
19 months ago by
mheydarpour10
mheydarpour10 wrote:

I have 130 individuals with their RNAseq data in 7 batches treated in a Case-Control study (Case=37 and Control=93). I would like to run differential gene expression analysis for 18258 genes for this study in Limma. When I read some discussions about removing batch effect in the model through Bioconductor's forum, I found that the best way to remove batch effect is to consider individuals as random in the model. So, regarding this assumption I created my design matrix as: (Patient=130; Treat: Yes=37 & No=93)

design = model.matrix(~Patient+Treat)
fit= lmFit(GenExp, design)

But, after I fitting the model and run in Limma, I got the following error:

Coefficients not estimable: TreatNo
Warning message:
Partial NA coefficients for 18258 probe(s)

How can I fix this issue? any other comments for removing batch effect? Thanks for any advise.

modified 19 months ago by Aaron Lun24k • written 19 months ago by mheydarpour10
Answer: Remove batch effect using individuals as random in the model in Limma
1
19 months ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

I assume Patient is actually specifying what batch each patient belongs to, and that the batches are nested within your case/control conditions. This type of experimental design is usually handled in limma by using duplicateCorrelation, which is analogous to a conventional mixed effects model.

design <- model.matrix(~Treat)
dc <- duplicateCorrelation(GenExp, design=design, block=Batch)
fit <- lmFit(GenExp, design=design, block=Batch, correlation=dc\$consensus)


Thanks Aaron for the info and the codes.

But, my problem is batch situation. All my Cases=37 are in the Batch 7, and the rest of patients are separated in Batch from 1 to 7. It means I have no Cases in batch 1 to 6. That's why I decided treat patients as a random effect in the model. But I don't know that I am right? any comment. Thanks

1. I still don't see a problem with using duplicateCorrelation here.
2. Stop using "patient" and "batch" interchangeably, if you're intending to block on the latter.
3. limma has no extended support for random effects in the way that lme4 or nlme do. The closest you can get is with duplicateCorrelation, but this is probably better anyway as it shares information across genes to improve power - see A: How does edgeR compare to lme4?.

Hi Aaron,

I run my data using "duplicateCorrelation" and considered batch as block in the model. But I don't know why I got very large "LogFC" for many genes in my output. My data is normalized using Quantile normalization and I have no "NA" in my gene-expression file. Any comment?

Gene-ID        logFC      AveExpr          t    P.Value   adj.P.Val         B
MIR4461  -7151739.6177 15923526.479 -0.5029371 0.61586878 0.9335903 -5.179130
BCYRN1      -5104.4872    18245.922 -0.5955296 0.55253615 0.9191584 -5.141185
SNORD95     -2518.2164     8406.152 -0.2923091 0.77052135 0.9617879 -5.241702
SNORD104    -2482.6702     4917.511 -0.4671700 0.64116880 0.9372194 -5.192083
SNORD55     -2445.6014     8061.641 -0.3439396 0.73145342 0.9532723 -5.229422
MYL2        -2267.7890    27445.539 -0.4653459 0.64247086 0.9372194 -5.192718
MTRNR2L2    -1796.5067     9399.920 -0.5629753 0.57443187 0.9226152 -5.155250
MYH7        -1376.0078     1989.892 -2.4113620 0.01730615 0.5155984 -3.157871
MIR4442     -1156.0135     2417.504 -0.4825027 0.63026917 0.9364068 -5.186647
RPS27        -871.2428     3610.345 -0.5790886 0.56354236 0.9203275 -5.148386