Question

limma design model for categorical variable

0

Entering edit mode

Jitendra ▴ 10

@nabiyogesh-11718

Last seen 25 days ago

United Kingdom

Hi all,

could anyone please help how to design model in limma for categorical variables association with methylation epic array data?

I am using below model but keep getting error; status is the categorical variable here;

#model matrix

var1<-model.matrix(~status + as.factor(sex) + Age +CD8T +CD4T +NK + Bcell +Mono+smoking + PC1 + PC2 + PC3, data=targets2)
fit1<-lmFit(mval,var1)


fit1<-eBayes(fit,trend=TRUE, robust=TRUE)

probe<-topTable(fit1,adjust="BH",coef=2,num=Inf)

sig.probe<-probe[which(probe$adj.P.Val<=0.05),]

Many thanks

DNAMethylation limmaGUI biostatistics limma • 609 views

ADD COMMENT • link updated 5 months ago by Gordon Smyth 51k • written 5 months ago by Jitendra ▴ 10

score 0 · Answer 1 · 2024-02-06

0

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 7 minutes ago

WEHI, Melbourne, Australia

The error message would tell you what the problem is and hence how to fix it.

You haven't told us what the error message is or even which line of code produced the error. I doubt that the error has anything to do with having a categorical variable in the model.

ADD COMMENT • link 5 months ago Gordon Smyth 51k

0

Entering edit mode

Thanks, this model previously working effectively when analyzing the association between continuous variables and DNA methylation. However, when attempting to assess the relationship with categorical variables, the script appears to hang indefinitely on the high-performance computing (HPC) system. In contrast, analyzing continuous variables only took approximately two hours to complete.

But will try again!

ADD REPLY • link 5 months ago Jitendra ▴ 10

1

Entering edit mode

How many levels does status have? Please type table(status) and show the output.

What you are describing is not a limma error, but an issue with the size of the dataset and your computational resources, for example running out of memory.

Computation time in limma is determined by the size of the dataset (which you haven't described) and by the number of columns to the design matrix. For a given number of design matrix columns, computation time is unaffected by whether the original variables were continous or categorical. The problem has nothing to do with categorical vs continuous but simply the size of the fitted model

I note that you have posted several previous questions on this forum where you reported successfully fitting limma models to methylation data including categorical variables, so you must already know from your own experience that there's no particular problem with categorical variables. Your current model as several categories variables, not just status but also sex and probably smoking.

ADD REPLY • link 5 months ago Gordon Smyth 51k

0

Entering edit mode

thanks, Gordon Smyth, it is sorted now. it was just a large sample size. above command work perfectly with both continue and categorical variable association analysis.

ADD REPLY • link 5 months ago Jitendra ▴ 10