Question: Effects of continuous variable with adjustment for gender
gravatar for jhj89
2.8 years ago by
jhj8910 wrote:


I am trying to find a list of genes significantly associated with a continuous variable (e.g. glucose) so I used edgeR to create a design matrix as follows:

design <- model.matrix(~glucose)

​Followed by: 

disp <- estimateDisp(y, design, robust = TRUE)
fit <- glmFit(disp, design, robust = TRUE)
lrt <- glmLRT(fit, coef = 2)
topTags(lrt, n=Inf, adjust.method = "BH", p.value = 0.05)

to give a list of genes associated with glucose.

In addition, I wanted to adjust for gender, so I included gender as a factor.

design <- model.matrix(~glucose+gender)

However, doing so resulted in gender-related genes (coming from Y chromosome) coming up as significantly associated. At this point, I am not sure what's going on.

Separately, I tried out limma-voom to replicate what I have done above. <- model.matrix(~glucose+gender)
limvm <- voom(y,, plot = TRUE)
fit <- lmFit(limvm,
fit <- eBayes(fit)
topTable(fit, coef=2, adjust.method = "BH", p.value = 0.05, number = Inf)

which I think should have done the same thing, but instead there are no gender-related genes, and the number of genes are much less. To find glucose associated genes while adjusting for gender, am I doing this correctly? Thank you.

ADD COMMENTlink modified 2.8 years ago by Gordon Smyth35k • written 2.8 years ago by jhj8910

When using edgeR, are you saying that testing for glucose (coef=2) gave Y chromosome genes as DE when using ~glucose+gender but not when using ~glucose alone. Or are you saying that you got Y chromosome genes as DE with both design matrices?

ADD REPLYlink written 2.8 years ago by Gordon Smyth35k

Y chromosome genes were given when ~glucose+gender was used but not when ~glucose alone was used. 

But from what I understand from your comments below, I can either:

- remove sex-linked genes from the analysis or,

- don't include sex into the model (i.e. just use ~glucose) as robust = TRUE can down-weight the sex-linked genes? 

ADD REPLYlink written 2.8 years ago by jhj8910


ADD REPLYlink written 2.8 years ago by Gordon Smyth35k
gravatar for Gordon Smyth
2.8 years ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:

To make the limma-voom analysis similar to the edgeR analysis, you would use

fit <- eBayes(fit, robust=TRUE)

since you've done the equivalent for edgeR.

It is very hard for a linear model to fully adjust for gender effects, because the Y chromosome genes don't have any counts for females, so making it hard to estimate the baseline. In this situation, I find it better to explicitly remove the sex-linked genes (XIST and the Y chromosome) from the analysis, rather than trying to do the adjustment in the linear model.

Alternatively you can just trust that the robust=TRUE method will down-weight any sex-linked genes automatically for you (when gender is not included in the model). It seems it has already done this. To understand why this is the case, have a read of the case study in this preprint:

Note, this assumes that you remove gender from the design matrix.

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Gordon Smyth35k

Thank you Gordon for your answer.

As you suggested, I have done 

fit <- eBayes(fit, robust=TRUE)

but the number of significantly associated genes didn't change much (it decreased a little). Pardon my naivety, but does log-cpm values in Voom contributing to this?

In regards to difficulty adjusting for gender, what you are saying does make sense. I will try removing sex-linked genes and perform my analysis.

I don't fully understand your last statement though. adjustment for gender in a robust model (in edgeR) did indeed have sex-linked genes coming up as significantly associated. What do you mean by a robust model down-weighting any sex-linked genes automatically? Thank you again for your time.

ADD REPLYlink written 2.8 years ago by jhj8910

No, I don't think I've made myself entirely clear. The robust methods in limma and edgeR will downweight sex-linked genes when gender in _not_ included in the linear model. If you want the robust method to work this way, then you can't include gender in the model as well.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Gordon Smyth35k

Thank you for the reading material. I understand that using robust algorithm can down-weight sex-linked genes. However, is the robust algorithm and down-weighting of sex-linked genes only applicable in Limma? Because doing so in edgeR didn't down-weight the sex-linked genes.

I went back and looked at my samples and realised that high glucose values skew towards males (more males have high glucose than females). Can that affect the genes coming up (such as having sex-related genes coming up) when adjusting for gender? 

ADD REPLYlink written 2.8 years ago by jhj8910

If robust=TRUE in estimateDisp doesn't have much effect, consider using robust=TRUE in glmQLFit, which is a closer analogue to what eBayes does. As for your second question, the adjustment for sex should have been more conservative with respect to detecting these sex-linked genes, i.e., the adjustment should reduce the significance of the glucose effect on these genes. This is because the sex-related coefficients would absorb any DE that might otherwise be attributed to glucose levels (given that the glucose levels are confounded with sex).

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Aaron Lun21k

Thank you for your answer:

Just a side question: I was reading this post from biostars ( and I was wondering how should I treat the continuous variable being included in the model. Should I include as it is, or log-transform them? I guess it kind of makes sense to log-transform a continuous variable if a value of one sample is 50x different from that of another, and I understand that the number of DE genes can change, but what would be the best practice of deciding what to do?

ADD REPLYlink written 2.8 years ago by jhj8910

If glucose is so clearly sex-linked in your data, then it may be perfectly correct to get Y chromosome genes as DE when testing for glucose.

There is no major difference between limma and edgeR in this regard.

ADD REPLYlink written 2.8 years ago by Gordon Smyth35k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 161 users visited in the last hour