Question

Remove batch effect using individuals as random in the model in Limma

0

Entering edit mode

mheydarpour ▴ 10

@mheydarpour-9430

Last seen 5.2 years ago

I have 130 individuals with their RNAseq data in 7 batches treated in a Case-Control study (Case=37 and Control=93). I would like to run differential gene expression analysis for 18258 genes for this study in Limma. When I read some discussions about removing batch effect in the model through Bioconductor's forum, I found that the best way to remove batch effect is to consider individuals as random in the model. So, regarding this assumption I created my design matrix as: (Patient=130; Treat: Yes=37 & No=93)

design = model.matrix(~Patient+Treat)
fit= lmFit(GenExp, design)

But, after I fitting the model and run in Limma, I got the following error:

Coefficients not estimable: TreatNo
Warning message:
Partial NA coefficients for 18258 probe(s)

How can I fix this issue? any other comments for removing batch effect? Thanks for any advise.

batch effect Coeficents non estimable block random effects limma design matrix • 2.7k views

ADD COMMENT • link updated 6.5 years ago by Aaron Lun ★ 28k • written 6.5 years ago by mheydarpour ▴ 10

score 1 · Answer 1 · 2017-11-12

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 19 hours ago

The city by the bay

I assume Patient is actually specifying what batch each patient belongs to, and that the batches are nested within your case/control conditions. This type of experimental design is usually handled in limma by using duplicateCorrelation, which is analogous to a conventional mixed effects model.

design <- model.matrix(~Treat)
dc <- duplicateCorrelation(GenExp, design=design, block=Batch)
fit <- lmFit(GenExp, design=design, block=Batch, correlation=dc$consensus)

ADD COMMENT • link 6.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks Aaron for the info and the codes.

But, my problem is batch situation. All my Cases=37 are in the Batch 7, and the rest of patients are separated in Batch from 1 to 7. It means I have no Cases in batch 1 to 6. That's why I decided treat patients as a random effect in the model. But I don't know that I am right? any comment. Thanks

ADD REPLY • link 6.5 years ago mheydarpour ▴ 10

0

Entering edit mode

I still don't see a problem with using duplicateCorrelation here.
Stop using "patient" and "batch" interchangeably, if you're intending to block on the latter.
limma has no extended support for random effects in the way that lme4 or nlme do. The closest you can get is with duplicateCorrelation, but this is probably better anyway as it shares information across genes to improve power - see A: How does edgeR compare to lme4?.

ADD REPLY • link 6.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

Hi Aaron,

I run my data using "duplicateCorrelation" and considered batch as block in the model. But I don't know why I got very large "LogFC" for many genes in my output. My data is normalized using Quantile normalization and I have no "NA" in my gene-expression file. Any comment?

Gene-ID logFC AveExpr t P.Value adj.P.Val B
MIR4461 -7151739.6177 15923526.479 -0.5029371 0.61586878 0.9335903 -5.179130
BCYRN1 -5104.4872 18245.922 -0.5955296 0.55253615 0.9191584 -5.141185
SNORD95 -2518.2164 8406.152 -0.2923091 0.77052135 0.9617879 -5.241702
SNORD104 -2482.6702 4917.511 -0.4671700 0.64116880 0.9372194 -5.192083
SNORD55 -2445.6014 8061.641 -0.3439396 0.73145342 0.9532723 -5.229422
MYL2 -2267.7890 27445.539 -0.4653459 0.64247086 0.9372194 -5.192718
MTRNR2L2 -1796.5067 9399.920 -0.5629753 0.57443187 0.9226152 -5.155250
MYH7 -1376.0078 1989.892 -2.4113620 0.01730615 0.5155984 -3.157871
MIR4442 -1156.0135 2417.504 -0.4825027 0.63026917 0.9364068 -5.186647
RPS27 -871.2428 3610.345 -0.5790886 0.56354236 0.9203275 -5.148386

ADD REPLY • link 6.5 years ago mheydarpour ▴ 10