Remove batch effect using individuals as random in the model in Limma
1
0
Entering edit mode
mheydarpour ▴ 10
@mheydarpour-9430
Last seen 5.8 years ago

I have 130 individuals with their RNAseq data in 7 batches treated in a Case-Control study (Case=37 and Control=93). I would like to run differential gene expression analysis for 18258 genes for this study in Limma. When I read some discussions about removing batch effect in the model through Bioconductor's forum, I found that the best way to remove batch effect is to consider individuals as random in the model. So, regarding this assumption I created my design matrix as: (Patient=130; Treat: Yes=37 & No=93)

design = model.matrix(~Patient+Treat)
fit= lmFit(GenExp, design)

But, after I fitting the model and run in Limma, I got the following error:

Coefficients not estimable: TreatNo
Warning message:
Partial NA coefficients for 18258 probe(s)

How can I fix this issue? any other comments for removing batch effect? Thanks for any advise.

 

batch effect Coeficents non estimable block random effects limma design matrix • 3.1k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 3 hours ago
The city by the bay

I assume Patient is actually specifying what batch each patient belongs to, and that the batches are nested within your case/control conditions. This type of experimental design is usually handled in limma by using duplicateCorrelation, which is analogous to a conventional mixed effects model.

design <- model.matrix(~Treat)
dc <- duplicateCorrelation(GenExp, design=design, block=Batch)
fit <- lmFit(GenExp, design=design, block=Batch, correlation=dc$consensus)
ADD COMMENT
0
Entering edit mode

Thanks Aaron for the info and the codes.

But, my problem is batch situation. All my Cases=37 are in the Batch 7, and the rest of patients are separated in Batch from 1 to 7. It means I have no Cases in batch 1 to 6. That's why I decided treat patients as a random effect in the model. But I don't know that I am right? any comment. Thanks  

ADD REPLY
0
Entering edit mode
  1. I still don't see a problem with using duplicateCorrelation here.
  2. Stop using "patient" and "batch" interchangeably, if you're intending to block on the latter.
  3. limma has no extended support for random effects in the way that lme4 or nlme do. The closest you can get is with duplicateCorrelation, but this is probably better anyway as it shares information across genes to improve power - see A: How does edgeR compare to lme4?.
ADD REPLY
0
Entering edit mode

Hi Aaron,

I run my data using "duplicateCorrelation" and considered batch as block in the model. But I don't know why I got very large "LogFC" for many genes in my output. My data is normalized using Quantile normalization and I have no "NA" in my gene-expression file. Any comment?

Gene-ID        logFC      AveExpr          t    P.Value   adj.P.Val         B
MIR4461  -7151739.6177 15923526.479 -0.5029371 0.61586878 0.9335903 -5.179130
BCYRN1      -5104.4872    18245.922 -0.5955296 0.55253615 0.9191584 -5.141185
SNORD95     -2518.2164     8406.152 -0.2923091 0.77052135 0.9617879 -5.241702
SNORD104    -2482.6702     4917.511 -0.4671700 0.64116880 0.9372194 -5.192083
SNORD55     -2445.6014     8061.641 -0.3439396 0.73145342 0.9532723 -5.229422
MYL2        -2267.7890    27445.539 -0.4653459 0.64247086 0.9372194 -5.192718
MTRNR2L2    -1796.5067     9399.920 -0.5629753 0.57443187 0.9226152 -5.155250
MYH7        -1376.0078     1989.892 -2.4113620 0.01730615 0.5155984 -3.157871
MIR4442     -1156.0135     2417.504 -0.4825027 0.63026917 0.9364068 -5.186647
RPS27        -871.2428     3610.345 -0.5790886 0.56354236 0.9203275 -5.148386

ADD REPLY

Login before adding your answer.

Traffic: 577 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6